Stock Sentiment Analysis using News Headlines
In this tutorial, we are going to train a model of a stock sentiment analysis using news headlines. This model checks that the stocks are increased or decreased using the news headlines. This is a Natural Language Processing based project. The data set stores different news headlines. Sentiment Analysis is a process of classifying whether a piece of written sentence or headline is positive, negative or neutral. Sentiment Analysis system for text analysis combines the NLP and ML techniques to assign sentiment scores to the categories within a sentence. So we are going to train a NLP model with the help of following such steps :-
- Import all required libraries
- Load data
- Split data into training and testing
- Apply regular expression to remove other characters and symbols accept English alphabets
- Apply CountVectorizer
- Define model
- Prepare test set
- Apply predictions
- Check the overall accuracy of model
About the problem and the dataset used.
The data set in consideration is a combination of the world news and stock price shifts.
Data ranges from 2008 to 2016 and the data from 2000 to 2008 was scrapped from Yahoo finance.
There are 25 columns of top news headlines for each day in the data frame.
Class 0- the stock price stayed the same or decreased.
Class 1- the stock price increased.
Let’s start :-
Firstly import all the required libraries like Pandas, NLTK, Regular Expression, Scikit-learn, numpy etc.
Now check the minimum and maximum value of date using the max() and min() function. Checking the minimum and maximum date for splitting the data.
Split the data into training and testing. Training data contains all the news headlines whose date is less than ‘2015-01-01’ and Testing data contains all the news headlines whose date is greater than ‘2014-12-31’. Then extract the y_train and y_test labels from the training and testing set. And check the shape of the train, test, y_train, y_test.
Now make all headlines from uppercase to lowercase and display the data.
Now finally, apply a prediction on the test data. And display the difference between the actual and predicted labels of stock sentiment news headlines.

Now check the classification report, confusion matrix and accuracy score to check the overall accuracy of the random forest classifier model.
- A confusion matrix is a summary of prediction results on classification problems. The number of correct predictions and incorrect predictions are summarized with count values and broken down by each class.
- A Classification report is used to measure the predictions from a classification problem. This report shows the classification metrics precision, recall and f1-score on a per-class basis.
- An accuracy score computes the overall accuracy of the model with the help of y_actual and y_pred.
Source code and How to use :-
- Go to my GitHub account and download or fork repository : Stock Sentiment Analysis
- Open Jupyter notebook on project path.
- Then open project and you can use now.












Nice 👍
ReplyDeleteThank You
Delete