eklem/stopword-trainer
A module for creating stopword lists for any language, based on a set of documents.
This tool helps anyone working with text data to create customized lists of 'stopwords' – common words like 'the,' 'a,' or 'is' that are often removed before analysis. You provide a collection of documents, and it generates a stopword list tailored to that specific content or language. This is useful for data scientists, linguists, or anyone preparing text for machine learning, search, or content analysis.
Available on npm.
Use this if you need to build highly relevant stopword lists for a specific domain, language, or evolving content, rather than relying on generic, pre-defined lists.
Not ideal if you simply need a standard, off-the-shelf stopword list for a common language without any customization.
Stars
15
Forks
—
Language
JavaScript
License
MIT
Category
Last pushed
Feb 28, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/eklem/stopword-trainer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Alir3z4/python-stop-words
Get list of common stop words in various languages in Python
hklemp/dotnet-stop-words
Get list of common stop words in various languages in dotnet
igorbrigadir/stopwords
Default English stopword lists from many different sources
skupriienko/Ukrainian-Stopwords
the list of ~2000 ukrainian stopwords (with numbers)
stdlib-js/datasets-savoy-stopwords-fr
A list of French stop words.