Movies Review Sentiment Analysis
In this tutorial, we are going to implement Natural Language Processing based project. The project is Movies Reviews Sentiment Analysis who classify that the given review about the movie is a positive review or a negative review. So we train our model on the movies reviews dataset and use that model for implement end to end application. End-to-end means we deployed this model using the Python Framework flask on the localhost. And also try to deploy on the heroku. So we use a naive-bayes algorithm to train our model, this is the probability based algorithm.
So we are going to train our model with the help of following steps:
- Import required libraries
- load the data
- Perform EDA
- Remove stop words and apply stemming
- Convert the text data into vectors
- Split data into training and testing
- Define our model
- Test the model on testing data
- Check the accuracy
- Test on the custom data
About Dataset
Movies reviews data set which contains total 50,000 movie reviews with labels for Natural Language Processing, in this data 25000 reviews are positive and 25000 reviews are negative. And We have to protect the positive and negative review based on the given review.
Let's start :-
Import all the required libraries which are used for train our model like Pandas, Sklearn, NLTK, Pickel, etc.
Load the movie data using the Pandas and show top 5 rows of data and take from kaggle.
Now perform the exploratory data analysis. In EDA, firstly check the shape of the dataset and output is 50000 rows and 2 columns. Now check that there is null value or not. Describe the data set and also check the information of the dataset, check the unique values of the sentiment (labels) which is positive or negative. and visualize the positive and negative length of the sentiment.Now apply a LabelEncoding one sentiment to convert the categorical data into numerical data ( convert positive into 1 and convert negative into 0 ). And now show the top 5 rows of data.
now divide the data into Independent and dependent features. X stored our independent feature and y stored our dependent feature.
Now remove all special and numeric characters from the data and remove the stopwords ( is, am, this, that etc. ) and apply the stemming on the data and append in the Corpus list and print the Corpus list.
TF-IDF are word frequency scores that try to highlight words that are more interesting, e.g. frequent in a document but not across documents. The TfidfVectorizer will tokenize documents, learn the vocabulary and inverse document frequency weightings, and allow you to encode new documents.
Now split the data into training and testing, the ratio of training is 80% and ratio of testing is 20%. X_train stores the independent feature for training and Y_train stores the dependent feature of training. X_test stores the independent feature for testing and Y_test stores the dependent feature for testing. And show the shape of the X_train , X_test , Y_train , Y_test.
Show the table of difference of Actual and Predicted data.
Save the naive-bayes model and TfidfVectorizer object (by mistake, I save the TfidfVectorizer object with name count-Vectorizer.pkl. So ignore name) for further use. And load the model and TfidfVectorizer to test on the custom dataset.
I have also deployed this model using the python framework flask, which provide some tools for the web development.
This is the simple GUI of application, Now let's test the application.
So I gave a positive review to my application and Click on the predict button then it is predicted a positive review. It means my application is working correctly.
Prediction : Positive Review
So now I gave a negative review to my application and Click on the predict button then it is predicted a negative review. It means my application is working correctly.
Prediction : Negative Review
How to deploy Movies Review Sentiment Analysis on localhost : Deploy on localhost
How to deploy Movies Review Sentiment Analysis on Heroku : Deploy on heroku platform
Heroku deployment : Heroku deployment
Source code and how to use:
1. Download the dataset : Movies Review Dataset
2. Go to my github and download code : Movies Review Sentiment Analysis
3. After download, Extract the folder.
4. Go into the folder.
5. Open the command prompt and go to the project folder with cd command.
6. Write (app.py) in your prompt.
7. You get a link like (http://127.0.0.1:5000/).
8. Paste in the Chrome or any browser.
9. Now you can use the model.















Kino Society provides the best film reviews and movie reviews on the Internet. In addition, cinephile will help you to learn more about it.
ReplyDeleteThank You for this information sir...
Delete