faisaltareque/Multilingual-Sentence-Tokenizer
This Python package is designed for tokenizing sentences in over 40 languages. It serves as a wrapper around various open-source libraries. The package was created to support our work XL-HeadTags. To use it, simply provide the word and its corresponding language to the stemmer, and it will return the stemmed version of the word.
No commits in the last 6 months.
Stars
3
Forks
—
Language
Python
License
—
Category
Last pushed
Aug 11, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/faisaltareque/Multilingual-Sentence-Tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hplt-project/sacremoses
Python port of Moses tokenizer, truecaser and normalizer
Blake-Madden/OleanderStemmingLibrary
Porter stemming library (C++)
adbar/simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
htaghizadeh/PersianStemmer-Python
PersianStemmer-Python
michmech/lemmatization-lists
Machine-readable lists of lemma-token pairs in 23 languages.