Data Preprocessing In Machine Learning

* Data Preprocessing :-

Data Processing is most important phase in Machine Learning in which we prepare  the raw data which will suitable for our machine learning model.

"Data preprocessing is a process to convert raw data into meaningful data using different techniques ".

There are many steps of data preprocessing in machine learning. while we create machine learning projects we did not get consistent data always, sometimes data could be missed in our dataset or any other situation may arise. so this type of situations can be handle using data preprocessing steps. 

* Why do we need data preprocessing :- 

Data could be in any unusable format or missing values could be there in our data set. so this type of data can not be used for making machine learning model because which may arise problem in predicting the output. so if want actual output (Good Prediction) then we need to apply Data Preprocessing. One of the most important phase of Data Preprocessing is Data Cleaning, by using this phase we can clean our data and make it suitable for machine learning model which enhance our accuracy and efficiency of a machine learning model.

There are many steps in data preprocessing which are as follows :-  

* Getting the dataset :-

This is the initial step of data preprocessing in which you need to download the data set. 

As you can see in above image we have downloaded the dataset (csv file) of  "Bengaluru_House_Data.csv" and also load into our colab notebook.  

* Importing Libraries :-

After download the data set we import the important libraries which will help us to perform further steps.

As you can see in above image we have imported the required libraries.

* Importing Datasets :-

Then Import the data set using pandas library.

As you can see in above image we have imported dataset using pandas function "pd.read_csv".

* Finding Missing Data :-

Handle the missing data in our data set using libraries predefined functions.

As you can see in above image here we have checked the missing values in our dataset using "isnull()".

* Taking care of missing data :-

In this step we are handling the missing values which we have found in previous step.

As you can see in above image we have handle missing values using "SimpleImputer" class.

* Encoding Categorical Data :-

If there is any row or column in which categorical data present then we need to encode data in numeric form.

As you can see in above image we have encoded the (categorical data) required column in numerical form using "OneHotEncoder" class.

* Splitting dataset into training set and testing set :-

Then split our dataset into two parts training set and test set.

As you can see in above image we have split the data into training set and testing set using "train_test_split" class.

* Feature Scaling :-

Then perform Feature Scaling on our data which improve our model performance and increase accuracy.

As you can see in above image finally this is our last step of data preprocessing phase in which we have applied Feature Scaling on our training data and test data.


Thank You

If you have any doubts, Please let me know

Post a Comment (0)
Previous Post Next Post