Movies Review Sentiment Analysis

In this tutorial, we are going to implement Natural Language Processing based project. The project is Movies Reviews Sentiment Analysis who classify that the given review about the movie is a positive review or a negative review. So we train our model on the movies reviews dataset and use that model for implement end to end application. End-to-end means we deployed this model using the Python Framework flask on the localhost. And also try to deploy on the heroku. So we use a naive-bayes algorithm to train our model, this is the probability based algorithm.

So we are going to train our model with the help of following steps:

Import required libraries
load the data
Perform EDA
Remove stop words and apply stemming
Convert the text data into vectors
Split data into training and testing
Define our model
Test the model on testing data
Check the accuracy
Test on the custom data

About Dataset

Movies reviews data set which contains total 50,000 movie reviews with labels for Natural Language Processing, in this data 25000 reviews are positive and 25000 reviews are negative. And We have to protect the positive and negative review based on the given review.

Let's start :-

Import all the required libraries which are used for train our model like Pandas, Sklearn, NLTK, Pickel, etc.

Load the movie data using the Pandas and show top 5 rows of data and take from kaggle.

Now perform the exploratory data analysis. In EDA, firstly check the shape of the dataset and output is 50000 rows and 2 columns. Now check that there is null value or not. Describe the data set and also check the information of the dataset, check the unique values of the sentiment (labels) which is positive or negative. and visualize the positive and negative length of the sentiment.

Now apply a LabelEncoding one sentiment to convert the categorical data into numerical data ( convert positive into 1 and convert negative into 0 ). And now show the top 5 rows of data.

now divide the data into Independent and dependent features. X stored our independent feature and y stored our dependent feature.

Now remove all special and numeric characters from the data and remove the stopwords ( is, am, this, that etc. ) and apply the stemming on the data and append in the Corpus list and print the Corpus list.

TF-IDF are word frequency scores that try to highlight words that are more interesting, e.g. frequent in a document but not across documents. The TfidfVectorizer will tokenize documents, learn the vocabulary and inverse document frequency weightings, and allow you to encode new documents.

Now split the data into training and testing, the ratio of training is 80% and ratio of testing is 20%. X_train stores the independent feature for training and Y_train stores the dependent feature of training. X_test stores the independent feature for testing and Y_test stores the dependent feature for testing. And show the shape of the X_train , X_test , Y_train , Y_test.

Define my MultinomialNB naive-bayes algorithm which performs on the probability based and train using the X_train and Y_train data. And test the model using the X_test data. And to check the accuracy for the model, we use the accuracy_score, classification_report, confusion_matrix with the Y_test and pred.

Show the table of difference of Actual and Predicted data.

Save the naive-bayes model and TfidfVectorizer object (by mistake, I save the TfidfVectorizer object with name count-Vectorizer.pkl. So ignore name) for further use. And load the model and TfidfVectorizer to test on the custom dataset.

Define my test function to test the model on the custom dataset and test my model on two custom reviews, gave one positive review to my model for testing and it’s predicted correctly and again i gave one negative review to my model and again it’s predicted correctly.

I have also deployed this model using the python framework flask, which provide some tools for the web development.

This is the simple GUI of application, Now let's test the application.

So I gave a positive review to my application and Click on the predict button then it is predicted a positive review. It means my application is working correctly.

Prediction : Positive Review

So now I gave a negative review to my application and Click on the predict button then it is predicted a negative review. It means my application is working correctly.

Prediction : Negative Review

I also deployed this model on the heroku, you can access with the help of below link:
How to deploy Movies Review Sentiment Analysis on localhost : Deploy on localhost
How to deploy Movies Review Sentiment Analysis on Heroku : Deploy on heroku platform
Heroku deployment : Heroku deployment

Source code and how to use:

1. Download the dataset : Movies Review Dataset

2. Go to my github and download code : Movies Review Sentiment Analysis

3. After download, Extract the folder.

4. Go into the folder.

5. Open the command prompt and go to the project folder with cd command.

6. Write (app.py) in your prompt.

7. You get a link like (http://127.0.0.1:5000/).

8. Paste in the Chrome or any browser.

9. Now you can use the model.

Video Tutorial

Thank You !!!!!!!

Movies Review Sentiment Analysis in NLP

Movies Review Sentiment Analysis

About Dataset

Let's start :-

Source code and how to use:

Video Tutorial

2 Comments

Contact Form