yohasebe/lemmatizer

Lemmatizer for text in English. Inspired by Python's nltk.corpus.reader.wordnet.morphy

/ 100

Emerging

This tool helps you standardize English words by reducing them to their base or dictionary form, like changing "dogs" to "dog" or "hired" to "hire." You input a list of words, and it outputs their root forms, which is useful for analyzing text data. Anyone working with large volumes of English text, such as researchers, data analysts, or linguists, will find this beneficial.

112 stars. No commits in the last 6 months.

Use this if you need to prepare text data for analysis by normalizing words to their lemmas, ensuring different inflections of a word are treated as the same concept.

Not ideal if your text analysis requires preserving all inflected forms or if you need highly nuanced morphological analysis beyond simple lemmatization.

text-analysis natural-language-processing linguistics information-retrieval data-preparation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

112

Forks

Language

Ruby

License

MIT

Higher-rated alternatives

hplt-project/sacremoses

Python port of Moses tokenizer, truecaser and normalizer

Blake-Madden/OleanderStemmingLibrary

Porter stemming library (C++)

adbar/simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

htaghizadeh/PersianStemmer-Python

PersianStemmer-Python

michmech/lemmatization-lists

Machine-readable lists of lemma-token pairs in 23 languages.

Explore NLP Tools

All categories Trending NLP directory Insights