Different types of Cross Validations

Cross Validations

In this tutorial, we are going to learn about the Cross Validations. We will learn cross validation with theory as well as practical. We will use a cancer’s data set as an example. Make sure guys, my main motive is to explain the different types of Cross Validation, not the project. 

Why we use Cross Validation ?

In a machine learning model, We can't train the machine learning model with the whole data because if we train a model with the whole data, then we can't validate the model that it is working accurately or not. For example: We have a data set of cancer. There are independent features and dependent features. With the help of the independent features, our model predicts the dependent feature. So, If we split the data into the independent and dependent features and train the ML model with these complete data. So, we can't validate that our model will work accurately or not on the real data. So to solve this problem, we use cross validation.


What is Cross Validation ?

In machine learning, cross validation is a technique in which we train the machine learning model with some portion of the data and evaluate the model with the help of the remaining portion of the data. For example: we have a data set of cancer. So, we divide the data into training and testing data. Like we reserve the 50% data for training the ML model and 50% data for evaluating the ML model.


Drawback

If you divide the data in the ratio of 50:50 (50% for training and 50% for testing). In this case, the remaining 50% of the data for evaluating contains some important data which are left while training the model with the reserve 50% training data.



Common steps:-

First of all, load the cancer data set and display the top five rows of the cancer data set using the head() function. 



Remove unwanted columns and check that there is any null value or not. And split the data set into the independent and dependent features 



Hold-out Cross Validation

Hold-Out Cross Validation Is a technique in which you split the data into training and testing (80:20 ratio). The training data is what the model is trained on and the testing data is what the model evaluates on using test data. In other words, your model trained on the training data and remaining unseen data is used to test the model and evaluate the performance of the model. It uses random sampling to split data. The major drawback is that we use a random_state and based on the different random_state, we’ll get different accuracies. So we can’t evaluate a model.



K-Fold Cross Validation

K-Fold Cross Validation is a technique to split the data into K numbers of subsets of equal size which is also known as folds. Then perform the training on all the number of subsets but except one subset (k-1). Remaining ones subset is used for evaluating the model.It also uses random sampling to split data. The disadvantage of K-Fold is that it doesn’t work properly on the imbalance data set.



Stratified K-Fold Cross Validation

K-Fold Cross Validation technique will not work properly on the imbalanced data set. When we have Imbalanced data set, we need to change the k-fold cross validation technique,  such that each fold contains approximately the same strata of samples of each output class as the complete. For example: If we have 100 instances. So, 80 are positive and 20 are negative. So when we use stratified sampling, data split into train and test like train_set consists of 60 positive and 15 negative class, and test_set consists of 20 positive and 5 negative class. In Stratified K-Fold cross validation, it does stratified sampling instead of random sampling.



LOOCV (Leave One Out Cross Validation)

LOOCV is a method in which we perform the training on the whole data set but leave only one data point for evaluating the model. If there are N data points in the original data set then N-1 samples are used to train the machine learning model and the remaining one point is used for the validation. This process is repeated in which the original samples can be separated this way. This cross validation has some advantages as well as disadvantages.  The advantage of using the leave one out cross validation is that it makes use of all the data points and because of it, it has low bias. The disadvantage of using the leave one out cross validation is that it deals with a higher variance in the testing model as we are testing only for a single data point. If the data point is an outlier then it leads to a higher variance. And another drawback is that it takes lots of execution time to complete the process. 



Source code : 

  1. Go to my GitHub account and fork or download the repository : Cross Validations

  2. Then open .ipnyb file in jupyter notebook.


Video Tutorial



Thank You !!!!!!!!

3 Comments

If you have any doubts, Please let me know

  1. Replies
    1. Thank You, follow for more interesting tutorials

      Delete
  2. Best bets for NFL Week 14: Odds, lines, picks
    NFL Week 14 picks: Odds, lines, 1xbet login picks: 솔 카지노 Our expert's 토토 사이트 모음 picks and predictions for the best betting 제왕 카지노 lines for this 더킹카지노 도메인 week's game.

    ReplyDelete
Post a Comment
Previous Post Next Post