Back to Browse

ML | Cross Validation | How to visualise KFold Cross Validation using Python and Matplotlib

2.3K views
Premiered Oct 4, 2020
15:32

How to visualise KFold Cross Validation using Python and Matplotlib In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used,[15] but in general k remains an unfixed parameter. Code Starts Here =============== from sklearn.model_selection import KFold import numpy as np import matplotlib.pyplot as plt np.random.seed(1339) n_splits = 5 Generate the class/group data n_points = 100 X = np.random.randn(100, 10) percentiles_classes = [.2, .4, .4] y = np.hstack([[ii] * int(100 * perc) for ii, perc in enumerate(percentiles_classes)]) Evenly spaced groups repeated once groups = np.hstack([[ii] * 10 for ii in range(10)]) def visualize_groups(classes, groups, name): Visualize dataset groups fig, ax = plt.subplots() ax.scatter(range(len(groups)), [1] * len(groups), c=groups, marker='_', lw=50) ax.scatter(range(len(groups)), [2] * len(groups), c=classes, marker='_', lw=50) visualize_groups(y, groups, 'no groups') def plot_cv(cv, X, y, group, ax, n_splits, lw=10): for ii, (tr, tt) in enumerate(cv.split(X=X, y=y, groups=groups)): Fill in indices with the training/test groups print('iiiiiiiiiiiiiii :',ii) print('train :', tr) print('test :', tt) indices = np.array([np.nan] * len(X)) print('Indices Before',indices) indices[tt] = 1 indices[tr] = 0 print('Indices After :', indices) print('Range Len ',range(len(indices))) print('Leeeen :', len(indices)) print([ii + 1] * len(indices)) Visualize the results ax.scatter(range(len(indices)), [ii + 1] * len(indices), c=indices, marker='_', lw=lw) Plot the data classes and groups at the end ax.scatter(range(len(X)), [ii + 2] * len(X), c=y, marker='_', lw=lw) This is for Plotting the Groups ax.scatter(range(len(X)), [ii + 3] * len(X), c=group, marker='_', lw=lw) Formatting yticklabels = list(range(n_splits)) + ['class', 'group'] ax.set(yticks=np.arange(n_splits+2) + 1, yticklabels=yticklabels, xlabel='Sample index', ylabel="CV iteration", ylim=[n_splits+2.2, -.2], xlim=[0, 100]) ax.set_title('{}'.format(type(cv).__name__), fontsize=15) return ax fig, ax = plt.subplots() n_splits = 5 cv = KFold(n_splits,shuffle=True) plot_cv(cv, X, y, groups, ax, n_splits) All Playlist of this youtube channel ==================================== 1. Data Preprocessing in Machine Learning https://www.youtube.com/playlist?list=PLE-8p-CwnFPuOjFcbnXLFvSQaHFK3ymUW 2. Confusion Matrix in Machine Learning, ML, AI https://www.youtube.com/playlist?list=PLE-8p-CwnFPvXzvsEcgb0IZtNsw_0vUzr 3. Anaconda, Python Installation, Spyder, Jupyter Notebook, PyCharm, Graphviz https://www.youtube.com/playlist?list=PLE-8p-CwnFPsBCsWwz_BvbZZHIVQ6wSZK 4. Cross Validation, Sampling, train test split in Machine Learning https://www.youtube.com/playlist?list=PLE-8p-CwnFPsHtol5WXHhq_B3kQPggHH2 5. Drop and Delete Operations in Python Pandas https://www.youtube.com/playlist?list=PLE-8p-CwnFPtvqVVK7QVFsMvDvp2YgCnR 6. Matrices and Vectors with python https://www.youtube.com/playlist?list=PLE-8p-CwnFPsndwnZnL7nXW5mIrdRmgdg 7. Detect Outliers in Machine Learning https://www.youtube.com/playlist?list=PLE-8p-CwnFPvyCX35yES5D9W7vThiUzwk 8. TimeSeries preprocessing in Machine Learning https://www.youtube.com/playlist?list=PLE-8p-CwnFPv10bru3719xzDNIgbO6hXA 9. Handling Missing Values in Machine Learning https://www.youtube.com/playlist?list=PLE-8p-CwnFPvOec0LZ40Bt8OQcbLFa236 10. Dummy Encoding Encoding in Machine Learning https://www.youtube.com/playlist?list=PLE-8p-CwnFPvu7YriqMZsL9UDbqUUk90x 11. Data Visualisation with Python, Seaborn, Matplotlib https://www.youtube.com/playlist?list=PLE-8p-CwnFPuYBYsmbfMjROOCzKjCwyMH 12. Feature Scaling in Machine Learning https://www.youtube.com/playlist?list=PLE-8p-CwnFPtwpVV3FwzwYZYR5hT3i52G 13. Python 3 basics for Beginner https://www.youtube.com/playlist?list=PLE-8p-CwnFPu-jseUMtc4i47jQZN4PNbf 14 Interview Questions in Machine Learning and Artificial Intelligence https://www.youtube.com/playlist?list=PLE-8p-CwnFPt7VBhcnh82y0autSzuOrZp 15. Jupyter Notebook Operations https://www.youtube.com/playlist?list=PLE-8p-CwnFPtqkFd67OZcoSv4BAI7ez5_

Download

0 formats

No download links available.

ML | Cross Validation | How to visualise KFold Cross Validation using Python and Matplotlib | NatokHD