Stemming (Rule Based Approach)
- Stemming helps reduce a word to its stem form — It removes suffices, like “ing”, “ly”, “s”, etc.But it often the actual words get neglected. eg: Entitling,Entitled->Entitl
- Stemming is faster as it chops the end of the word, without understanding the context of the word.
Lemmatizing (Dictionary-based approach)
- Lemmatizing derives the canonical form (‘lemma’) of a word. Morphological analysis to the root form — Entitling, Entitled->Entitle
- Lemmatizing is slower and more accurate and it takes context of the word in mind.
What is Stemming?
Stemming is the process of converting the words of a sentence to its non-changing portions. In the example of amusing, amusement, and amused above, the stem would be amus.
Types of Stemmers
You’re probably wondering how do I convert a series of words to its stems. Luckily, NLTK has a few built-in and established stemmers available for you to use! They work slightly differently since they follow different rules — which you use depends on whatever you happen to be working on.