In this tutorial, we are going to train a diabetes classification model using a machine learning algorithm and the deep learning algorithm. This model classified whether the person has diabetes or not based on the different parameters.
Let’s start:
First of all, load all the required libraries like pandas, numpy, scikit-learn, seaborn. These libraries help us to train the diabetes classification model.
Then remove all the warnings which are generated at the runtime of execution of the code.
Load a diabetes data set using the Pandas function read_csv() and pass a parameter diabetes.csv, this is our diabetes data set. And display the diabetes data set.
Now apply the exploratory data analysis. In EDA, firstly check the shape of the data set which shows the number of rows and columns. And Check that there is Null value present or not using the function isnull() and Check the info of the data set which describes null values, data type, memory usage, etc.
Check the description of the data set which describes the minimum value, maximum value, mean value, total count, standard deviation etc. and visualise the outcome of how many persons have diabetes or how many persons have no diabetes.
Count the outcome of how many persons have diabetes or how many have not diabetes. Check the unique values of outcomes using the unique() function.
Check the outliers using the visualisation of the data set. Visualise the outliers using the box plot and scatter plot. You can see the image below.
Now try to remove the outliers, firstly prepare the data set which means split the data set into the dependent and independent features. And Check the description of the data set before removing the outliers. You can clearly see in the image below.
Now remove the outliers, replace the zero value with mean value with the respective column. And describe the data set after removing the outliers and visualise the data set using the boxplot function.
Split the data set into training and testing using the hold out cross validation. It uses a train_test_split() function with some parameters to split the data set and it returns four splitting data sets. And check the shape of the splitting data set which is returned by the hold out cross validation technique. And check the count of y_train which is returned by hold out cv.
If you don’t read my different types of cross validation with theory and practical post. So you read it : click here
Now apply feature scaling using the standard scalar. standard scaler scales values between 0 -1. It is a normalization technique to scale data in a particular range.
Now define our machine learning model support vector classifier (SVC) and train SVC model on the training data set. And test the model on the test data set using the predict() function.
Check the model accuracy using the accuracy score, confusion matrix and classification report. And I got overall 81% accuracy from the machine learning model.
Display the difference between the actual output and the predicted output.
Now define the deep learning algorithm and pass a parameter hidden layer sizes (8,8) and train the deep learning algorithm on the training data set. And The model using the test data set.
Now check the accuracy of deep learning model same as the ML model using the accuracy score, confusion matrix and classification report. And I got overall 77% accuracy from the deep learning model.
You can tune the parameters using the HyperParameter tuning. And we’ll learn HyperParameter tuning in the next tutorial.
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.
Link : https://www.kaggle.com/mathchi/diabetes-data-set
Truly overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. Much obliged for sharing.artificial intelligence course in chennai
Sands Casino, Atlantic City: $20 FREE No Deposit Bonus! Enjoy great gaming action at Sands Casino, Atlantic City's premier 토토 사이트 코드 spot for fun. Deposit $20 free 메리트 카지노 주소 chip, once 샌즈 카지노 you've completed a $20 หาเงินออนไลน์ free bet, the 카지노사이트luckclub
Which dataset is this? Without a Link, this cannot be used.
ReplyDeleteThis dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.
DeleteLink : https://www.kaggle.com/mathchi/diabetes-data-set
Truly overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. Much obliged for sharing.artificial intelligence course in chennai
ReplyDeleteThank you sir and yes I'll continue.
DeleteSands Casino, Atlantic City: $20 FREE No Deposit Bonus!
ReplyDeleteEnjoy great gaming action at Sands Casino, Atlantic City's premier 토토 사이트 코드 spot for fun. Deposit $20 free 메리트 카지노 주소 chip, once 샌즈 카지노 you've completed a $20 หาเงินออนไลน์ free bet, the 카지노사이트luckclub