cmccomb/rust-stop-words
Common stop words in a variety of languages
This tool helps text analysts, data scientists, and researchers clean up written text by identifying and removing common 'stop words' like 'the', 'a', or 'is'. You provide text in a variety of languages, and it returns a cleaner version, making the core meaning easier to find and analyze. This is crucial for anyone performing tasks like sentiment analysis, topic modeling, or keyword extraction.
Use this if you need to preprocess text data in multiple languages to focus on important keywords and phrases by eliminating common, less meaningful words.
Not ideal if your analysis relies on the presence of common articles or prepositions, or if you only work with highly structured, non-textual data.
Stars
25
Forks
5
Language
Rust
License
MIT
Category
Last pushed
Feb 21, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/cmccomb/rust-stop-words"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PyThaiNLP/nlpo3
Thai natural language processing library in Rust, with Python and Node bindings.
forzagreen/n2words
Convert numerical numbers to written numbers, in 52+ languages.
greyblake/whatlang-rs
Natural language detection library for Rust. Try demo online: https://whatlang.org/
wikimedia/sentencex
A sentence segmentation library with wide language support optimized for speed and utility.
pemistahl/lingua-rs
The most accurate natural language detection library for Rust, suitable for short text and...