Loan Eligibility Classification in Machine Learning

Loan Eligibility Classification in Machine Learning

In this tutorial, we'll train a Machine Learning based model to classify that the individual person  is eligible for loan or not. Basically we use a probability based algorithm to train a model that is naive-bayes which classify that whose probability is higher eligible or not eligible. So based on the higher probability, it gives a result.

In this tutorial, we will learn some important points to train a Loan Eligibility Prediction. We'll train our model by using some steps.

  1. Exploratory Data Analysis.
  2. Check and remove outliers.
  3. Handling missing values.
  4. Separate dependent and independent features.
  5. Split data into training and testing.
  6. Perform LabelEncoding.
  7. Feature Scaling.
  8. Model defining.
  9. Model testing.
  10. Finding accuracy.

Let's start :-

Firstly Import all the required libraries which are used to train our model like Pandas, numpy matplotlib, LabelEncoder etc.

Fig 1.0

Now try to remove all the warnings which are generated. Then load our data and show top five rows of a data using head().

Fig 1.1

Now we apply Exploratory Data Analysis. In exploratory data analysis, we check the shape of the data, check the information of the data, describe the data, check that there is Null value or not. check the eligibility that how much are eligible and how much are not eligible, checking the outliers and try to remove the outliers.
In this Fig 1.2, We check the shape of the data which has 614 rows and 13 columns. And we check the information of the data.
        
Fig 1.2

In fig 1.3, We describe the data which shows the total count (length) of data, mean of data, standard deviation, minimum values, maximum value etc. And check that there is null value present or not.
Fig 1.3

Now checks the outliers, LoanAmount has outliers values, You can see in the below screenshot.

Fig 1.4

Fig 1.5

Now in Fig 1.6, We remove the outliers from the LoanAmount by applying a log, and store into new feature.

Fig 1.6

Now the next step is to handling the missing values of Gender, Married, Dependents, Self_Employed, LoanAmount, LoanAmount_log, Loan_Amount_Term, Credit_History, and then again check that there is Null value or not and we got result that there's no null values.
So first of all, We checked that there is missing values or not. So we got lots of missing values in our features. So we have two options to handle missing data.
1. Drop all rows which has missing cell.
2. Fill all cells according to the datatype.
    A. Object or categorical data is filled by mode.
    B. Integer or numeric data is filled by mean and median.
So in this, We choose option B. We filled all the values because we have a small dataset.

Fig 1.7

After filling all the missing values, here you can see that now there is no null value present. And one more thing that we implement a new feature Total_Income by adding the ApplicantIncome and CoapplicantIncome and there is outlier in the Total_Income feature. So we removed the outlier by applying log function on Total_Income feature.


Fig 1.8

In this Fig 1.9, This is our final data.

Fig 1.9

Now it's time to prepare the data that we divide the data into the dependent and independent features. And show the dependent and independent features.

Fig 2.0

Not it's time to divide the data into training and testing, X_train stores all independent feature data and Y_train stores dependent data for training, X_test stores all independent feature data and Y_test stores dependent data for testing, and check the shape of the data.

Fig 2.1

Now apply LabelEncoding to convert the categorical data into numeric data. For example if you have a gender feature and in the gender feature you have male, female and transgender. So it convert male, female and transgender into 0,1,2.
We have some categorical features which are need to convert into numeric. Cattegorical features are Gender, Married, Education, Self_Employed,  Loan_Status. We need to convert these into numeric. So we can use LabelEncoding, OneHotEncoding, NominalEncoding. But we will use here LabelEncoding to convert into categorical into numeric data. So you can see in the below pictures Fig 2.2 and Fig 2.3.

Fig 2.2

Fig 2.3

So the next step is to apply feature scaling for making the features into a particular scale. We have different feature to apply scaling like StandardScaler MinMaxScaler etc. But we apply here StandardScaler for scale the values into a particular scale like 0-1. Apply on X_train fit_transform but apply on X_test data only transform because it already fit on the on the X_train data. 

Fig 2.4

Now it's time to define our model we define here naive-bayes model. Test our model on the testing data and 1 stands for Eligible and 0 stands for Not Eligible. And then check confusion matrix, accuracy score, classification report.

Fig 2.5

Now save the naive-bayes model and load a model for testing.

Fig 2.6

Now load a test data for testing the model and show the top 5 rows using the head().


Fig 2.7

Now again we apply Exploratory Data Analysis on the testing data. In exploratory data analysis, we check the shape of the data, check the information of the data, describe the data, check that there is Null value or not. check the eligibility that how much are eligible and how much are not eligible, checking the outliers and try to remove the outliers.

In this Fig 2.7, We check the shape of the data which has 367 rows and 12 columns. And we check the information of the data.

Fig 2.8

Again check that there is null value present or not. So You can see that there is lots of null value. So we need to handle it.

Fig 2.9

Now the next step is to handling the missing values of Gender, Dependents, Self_Employed, Loan_Amount_Term, Credit_History and then again check that there is Null value or not in test data and we got result that there's no null values now.
So we have lots of missing values in our features. So we have two options to handle missing data.
1. Drop all rows which has missing cell.
2. Fill all cells according to the datatype.
    A. Object or categorical data is filled by mode.
    B. Integer or numeric data is filled by mean and median.
So in this, We choose option B. We filled all the values because we have a small test dataset.


Fig 3.0

In next step, check the outliers and try to remove it from data. So LoanAmount contains the outliers you can see in the below image.


Fig 3.1
In next step, we implement a new feature Total_Income by adding the ApplicantIncome and CoapplicantIncome. And show the final data using head() function.

Fig 3.2

Now it's time to prepare the dataset.
And perform the LabelEncoding here. So we apply the  LabelEncoding to convert the categorical data into numeric data. For example if you have a gender feature and in the gender feature you have male, female and transgender. So it convert male, female and transgender into 0,1,2. 
So the next step is to apply feature scaling for making the features into a particular scale. We have different feature to apply scaling like StandardScaler MinMaxScaler etc. But we apply here StandardScaler for scale the values into a particular range like 0-1. 

Fig 3.3


Now the last step is to test our model on the testing data and 1 stands for Eligible and 0 stands for Not Eligible. So we convert the 1 into Eligible and 0 into Not Eligible.


Fig 3.4

Source code and how to use:

  1. Go to my github  and download code : Loan Eligibility Classification in ML
  2. After download the project and dataset, just extract the project folder where you will get 3 files. Two files are training and testing data file and the last is for train the model.
  3. Go into the folder.
  4. Open the Jupyter notebook.
  5. Click on .ipny file.
  6. Now you can use the code and classify that who is eligible for loan.

Video Tutorial



Thank You !!!!!!!!

3 Comments

If you have any doubts, Please let me know

Post a Comment
Previous Post Next Post