dzieciou/pystempel
Python port of Stempel, an algorithmic stemmer for Polish language.
When analyzing Polish text, you often need to reduce different forms of a word (like "książka," "książki," "książkami") to a common base or "stem" ("książek"). This tool takes individual Polish words and outputs their root forms, which helps in grouping similar words for more accurate text analysis. It's used by anyone working with Polish language data, such as linguists, data scientists, or search engine developers.
No commits in the last 6 months.
Use this if you need to process Polish text and group related words together for tasks like search, information retrieval, or linguistic analysis.
Not ideal if you primarily work with languages other than Polish, or if you need to compile your own custom stemming tables from scratch.
Stars
39
Forks
5
Language
HTML
License
—
Category
Last pushed
Aug 29, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/dzieciou/pystempel"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hplt-project/sacremoses
Python port of Moses tokenizer, truecaser and normalizer
Blake-Madden/OleanderStemmingLibrary
Porter stemming library (C++)
adbar/simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
htaghizadeh/PersianStemmer-Python
PersianStemmer-Python
michmech/lemmatization-lists
Machine-readable lists of lemma-token pairs in 23 languages.