unfoldingWord/string-punctuation-tokenizer

Small library that provides functions to tokenize a string into an array of words with or without punctuation

38
/ 100
Emerging

This tool helps you break down any written text into individual words, optionally keeping punctuation marks separate. You provide a sentence or passage, and it gives you back a list of each word. It's ideal for text analysts, linguists, or anyone preparing text for further processing where word separation is critical.

Used by 1 other package. No commits in the last 6 months. Available on npm.

Use this if you need to accurately separate words from a string of text, with fine-grained control over whether punctuation should be treated as part of a word or as its own distinct item.

Not ideal if you need advanced natural language processing features like stemming, lemmatization, or sentiment analysis, as this tool focuses solely on basic tokenization.

text-analysis linguistics content-preparation data-preprocessing string-manipulation
Stale 6m
Maintenance 0 / 25
Adoption 5 / 25
Maturity 25 / 25
Community 8 / 25

How are scores calculated?

Stars

8

Forks

1

Language

JavaScript

License

MIT

Last pushed

Aug 09, 2023

Commits (30d)

0

Dependencies

1

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/unfoldingWord/string-punctuation-tokenizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.