In this tutorial, we will train a LSTM based deep learning model to detect fake news from a given news corpus. This tutorial could be practically used by any media company to automatically predict whether the circulating news is fake or not. The process could be done automatically without having humans manually review thousands of news related articles.
So i have implemented a Fake news classifier. This is Natural Language Processing based project which classifies the news is fake or genuine(not) and my model gave a 93% accuracy on the test data.
Let's start :-
So first of all import all the required libraries which we are using to implement our fake news classifier project.
Now sometimes some code gives a warning and warning may be different according to the code. So warnings look like this
So to ignore the warnings , we write a code
Now it's time for load the data of our fake news and let's show the starting 5 news. In our data , there are 5 features which you can see in below screenshot id, title, author, text and label. But i am implementing a fake news classifier. So our news are title and our target feature is label.
1 stands for Real news or genuine news
0 stands for fake news
Now it's time for EDA(Exploratory Data Analysis). Exploratory Data Analysis does this for Machine Learning. It is a way of visualizing, summarizing and interpreting the information that is hidden in rows and column format.
Now it's time for separate the independent and dependent features like our independent features are id, title, author and text but we use only title as a independent feature and our dependent features is label and we will also do a reset index because after removing null value rows, our indexing is not sequence like (0,1,3,4,6....etc). So for make index in sequence like (0,1,2,3,4,5....etc).
Now we remove the stopwords (is,am this etc) from our news and do a stemming using the PorterStemmer. Stemming is the process of producing morphological variants of a root/base word. Stemming programs are commonly referred to as stemming algorithms or stemmers. For example: likes, likely, liking and liked are converted into the root word like using the stemming.
Now apply a one-hot representation which is basically used to convert english sentence into indexes. For example: we have a dictionary of 10,000 words and if a word "go" in sentence which is available in dictionary at index 5033. So it convert "go" into 5033 and do for all words of sentences. And word "go" may be in two sentences So index will be same.
Now apply a pad_sequences which is basically used to make all sentence's length equal. For example: we have 10 news and some news are 5 words long , some are 10 words long, some are 15 words long etc. So it will add zeroes at the starting to make the sentences's length equal and also makes a 2D numpy array which you can see in the below screenshot.
Now convert the 2D numpy array into the numpy array.
Now split data into training and testing. And check the shape of the data
Define our own neural network architecture and start training of model for 30 epochs.
Now test a model using the test data and do evaluation. Evaluation is basically used to check the accuracy of the model. We can check accuracy score which tell us the accuracy of model, we can check confusion matrix which tell us that how much value are wrong predicted and also check classification report.
Now save a model, load a model and load a test.csv file to testing our model.
Now again follow the same steps which we perform above. Remove stopwords, apply stemming, apply one-hot representation to convert the words into indexes(numerical values), apply pad sequence to make all sentences's length equal and 2D numpy array, convert list of array into array and test our trained model on the testing data.
Now convert 0 into Fake and 1 into Real. And show the the news and result that the news is fake or real.
Now it's time to test a model on custom data.
So this is my complete Fake News Classifier project.