Introduction
Explore the blog for read the interesting content about NLP model on “Language Identification”
In this tutorial, we are going to implement a language identification model using natural language processing. It's a NLP based project. It identifies the language of a given sentence. For example: the given sentence is in English language like “what are you doing now-a-days?”. So, it is able to detect the given sentence in which language like we gave the sentence in English language, so it’ll give the result “English”.
I have trained NLP model on 22 different languages like English, Dutch, French, Latin, Hindi, Romanian, Tamil, etc. After training the model on a large data set of 22 different languages, it gave me 92% accuracy. It's a unique and useful project which can be used as a minor project for your college. You can also deploy this model on the local host as well as server like heroku platform or pythonanywhere.
Let’s start:-
So, the common steps is to load all the required libraries then load the languages data which contains the “Text” which is sentence and target feature “Language” and “Language” contains 22 different languages. And after that check the shape of the data. And it displayed 22000 rows and 2 columns of the data set.Check the count of each language and you can see in the below image that each language contains 100 different sentences. For example: English language contains 1000 different language sentences.
And perform the label encoding on our target feature which is a “Language”, it converts the language column into the numbers according to the order. Like 1,2,3,4,5,6, etc.
Then Check our target features after applying the label encoding then check the length of the target feature which is a 22000 and see all the labels using the classes_ which contains target data, you can see in the below image.
Now make a new data frame using the pandas function which contains the preprocessed sentences and target column which is in the form of numeric data.
Now create a new data frame to see the actual data and predicted output, which you can see in the below image.
Now save the model using the joblib library which takes two arguments. One is our classifier and the other one is the model’s name.
Then load the model using the load function of joblib which takes the model’s path.
- Go to my GitHub account and fork the repo : Language Identification
- Use jupyter notebook to use the .ipnyb file.















Excellent
ReplyDeleteThank You
Delete