eklem/words-n-numbers
Tokenizing strings of text. Regex extracting arrays of words and optionally numbers, emojis, tags, usernames and email addresses from strings. For Node.js and the browser. When you need more than just [a-z] regular expressions.
When you're processing large amounts of text, like social media feeds, customer reviews, or research papers, you often need to pull out specific pieces of information. This tool takes a string of text and gives you back lists of words, numbers, emojis, hashtags, usernames, or email addresses. It's for anyone building applications that need to intelligently parse and categorize textual data, such as a content analyst, a data scientist, or a social media manager.
Used by 1 other package. Available on npm.
Use this if you need a flexible way to extract specific types of tokens from text strings, beyond simple alphabetical words, including support for various languages and special characters.
Not ideal if you need to perform complex natural language understanding tasks like sentiment analysis, topic modeling, or grammar checking, as it focuses purely on extraction.
Stars
12
Forks
—
Language
JavaScript
License
MIT
Category
Last pushed
Feb 28, 2026
Commits (30d)
0
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/eklem/words-n-numbers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PyThaiNLP/nlpo3
Thai natural language processing library in Rust, with Python and Node bindings.
forzagreen/n2words
Convert numerical numbers to written numbers, in 52+ languages.
greyblake/whatlang-rs
Natural language detection library for Rust. Try demo online: https://whatlang.org/
wikimedia/sentencex
A sentence segmentation library with wide language support optimized for speed and utility.
pemistahl/lingua-rs
The most accurate natural language detection library for Rust, suitable for short text and...