unfoldingWord/string-punctuation-tokenizer
Small library that provides functions to tokenize a string into an array of words with or without punctuation
This tool helps you break down any written text into individual words, optionally keeping punctuation marks separate. You provide a sentence or passage, and it gives you back a list of each word. It's ideal for text analysts, linguists, or anyone preparing text for further processing where word separation is critical.
Used by 1 other package. No commits in the last 6 months. Available on npm.
Use this if you need to accurately separate words from a string of text, with fine-grained control over whether punctuation should be treated as part of a word or as its own distinct item.
Not ideal if you need advanced natural language processing features like stemming, lemmatization, or sentiment analysis, as this tool focuses solely on basic tokenization.
Stars
8
Forks
1
Language
JavaScript
License
MIT
Category
Last pushed
Aug 09, 2023
Commits (30d)
0
Dependencies
1
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/unfoldingWord/string-punctuation-tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PyThaiNLP/nlpo3
Thai natural language processing library in Rust, with Python and Node bindings.
forzagreen/n2words
Convert numerical numbers to written numbers, in 52+ languages.
greyblake/whatlang-rs
Natural language detection library for Rust. Try demo online: https://whatlang.org/
wikimedia/sentencex
A sentence segmentation library with wide language support optimized for speed and utility.
pemistahl/lingua-rs
The most accurate natural language detection library for Rust, suitable for short text and...