unfoldingWord/string-punctuation-tokenizer

Small library that provides functions to tokenize a string into an array of words with or without punctuation

/ 100

Emerging

This tool helps you break down any written text into individual words, optionally keeping punctuation marks separate. You provide a sentence or passage, and it gives you back a list of each word. It's ideal for text analysts, linguists, or anyone preparing text for further processing where word separation is critical.

Used by 1 other package. No commits in the last 6 months. Available on npm.

Use this if you need to accurately separate words from a string of text, with fine-grained control over whether punctuation should be treated as part of a word or as its own distinct item.

Not ideal if you need advanced natural language processing features like stemming, lemmatization, or sentiment analysis, as this tool focuses solely on basic tokenization.

text-analysis linguistics content-preparation data-preprocessing string-manipulation

Stale 6m

Maintenance 0 / 25

Adoption 5 / 25

Maturity 25 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

JavaScript

License

MIT

Higher-rated alternatives

PyThaiNLP/nlpo3

Thai natural language processing library in Rust, with Python and Node bindings.

forzagreen/n2words

Convert numerical numbers to written numbers, in 52+ languages.

greyblake/whatlang-rs

Natural language detection library for Rust. Try demo online: https://whatlang.org/

wikimedia/sentencex

A sentence segmentation library with wide language support optimized for speed and utility.

pemistahl/lingua-rs

The most accurate natural language detection library for Rust, suitable for short text and...

Explore NLP Tools

All categories Trending NLP directory Insights