Twitter Sentiments Analysis

Introduction

In this tutorial, we are going to implement a Twitter Sentiment Analysis using natural language processing.It is an NLP based project. When we give the review to our trained model then our model classifies that the given review is a positive review or a negative review.

Data set contains three features: one is an id, second one is a label and the last one is a tweet. Basically we need only two features to train our model one is a label and the second one is a tweet. Label feature is our target feature or we can say that is an output feature and the other one is tweet which contains the reviews. These two features are used to train the model.

Let’s start :-

Firstly, the common step is to import all the required libraries like pandas, numpy, regular expressions, etc. Then remove the warnings which are generated at the runtime of the execution of the code. Just simply write one line code to remove the runtime warning. Then load a data set of Twitter Sentiment using a pandas function which stores it in the csv file and then displays the whole data set.

Then perform the Exploratory Data Analysis and in EDA, firstly, we see the shape of the data set which displays the rows and columns. So, the data set contains 31962 rows and 3 columns but we’ll use only two. Then check the value count of target features which displays the number of 0’s and number of 1’s in the data set. Then check the information of the data set.

Now it's time to preprocess the data set, fastly remove all the characters, numbers, special characters, etc except the alphabets. Then remove the stopwords. Stopwords like is, am, are, the, that, etc, then append all the preprocessed reviews into a new list which is called Corpus. As you can see in the image below.

Then apply a countvectorizer method on the preprocessed Corpus list. Countvectorizer converts the text data into vectors. I already explained Countvectorizer in my previous post. Just go and read that. And check the shape of the data set after applying countvectorizer.

Then remember guys, basically it's a highly imbalanced data set so you can apply a K-Fold cross validation or Stratified K-Fold cross validation. This basically helps to handle the imbalanced data set. But for study purpose, I am applying here Hold-Out cross validation which split the data into the training and testing using the train_test_split() method and it returns four different data sets (X_train, X_test, y_train, y_test) as you can see in the below image.Then check the shape of all the four different data set.

Then define our naive-bayes model which performs based on the probability. So we use the Multinomial Naive-Bayes model which performs on probability like model returns the output 0.6, so final output will be 1.

Then test the model using the testing data. At the time of Hold-Out cross validation, we split data into the ratio of 80:20, which means 80% is training data and 20% is testing data. So, we tested the model using 20% testing data.

Now it's time to evaluate the model using the different evaluation techniques like accuracy score, confusion matrix, etc. So, after evaluation, we got an overall accuracy of 94%.

Then see the actual and predicted output into the form of a new data frame.

Then save the model using a joblib library which takes two arguments, one is classifier and the other one is the model name. Then load a model using the load method of the joblib library.

Now it’s time to test the model on custom reviews.Firstly we create a function to test the customer reviews. And in this function, we define all that functionality which we apply on the above data. Then give the custom review to model. For example: I gave a review to model like “This is a very interesting post” then our model returns the output “Positive Analysis

”. Then I tested the model on two more sentences as you can see in the image below. And it’s performing well almost all the time.

Source Code :-

Go to my GitHub and fork or download the repo : Twitter Sentiment Analysis
Use jupyter notebook to open .ipnyb file.

Video Tutorial

Thank You!!!!!!

Twitter Sentiments Analysis

1 Comments

Contact Form