arbox/tokenizer

A simple tokenizer in Ruby for NLP tasks.

41
/ 100
Emerging

This tool helps linguists and language technology practitioners break down written text into individual words and sentences for analysis. It takes raw German, English, or Dutch text and outputs a structured list of tokens (words, punctuation) that can be used for further linguistic processing. Anyone involved in natural language processing or computational linguistics can use this for text preparation.

No commits in the last 6 months.

Use this if you need to precisely segment text into its constituent linguistic units (sentences and words) for tasks like sentiment analysis, machine translation, or text classification.

Not ideal if you need advanced linguistic features beyond basic tokenization, as some features are still under development.

linguistics natural-language-processing text-analysis computational-linguistics language-technology
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

46

Forks

11

Language

Ruby

License

Last pushed

Apr 03, 2017

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/arbox/tokenizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.