yohasebe/lemmatizer
Lemmatizer for text in English. Inspired by Python's nltk.corpus.reader.wordnet.morphy
This tool helps you standardize English words by reducing them to their base or dictionary form, like changing "dogs" to "dog" or "hired" to "hire." You input a list of words, and it outputs their root forms, which is useful for analyzing text data. Anyone working with large volumes of English text, such as researchers, data analysts, or linguists, will find this beneficial.
112 stars. No commits in the last 6 months.
Use this if you need to prepare text data for analysis by normalizing words to their lemmas, ensuring different inflections of a word are treated as the same concept.
Not ideal if your text analysis requires preserving all inflected forms or if you need highly nuanced morphological analysis beyond simple lemmatization.
Stars
112
Forks
15
Language
Ruby
License
MIT
Category
Last pushed
Oct 14, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/yohasebe/lemmatizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hplt-project/sacremoses
Python port of Moses tokenizer, truecaser and normalizer
Blake-Madden/OleanderStemmingLibrary
Porter stemming library (C++)
adbar/simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
htaghizadeh/PersianStemmer-Python
PersianStemmer-Python
michmech/lemmatization-lists
Machine-readable lists of lemma-token pairs in 23 languages.