zencephalon/Tactful_Tokenizer

Accurate Bayesian sentence tokenizer in Ruby.

33
/ 100
Emerging

This tool helps developers accurately split raw text into individual sentences, even when dealing with tricky punctuation like question marks, exclamation points, and abbreviations. It takes in a block of text, potentially with some HTML formatting, and outputs a list of clearly separated sentences. A Ruby developer working on natural language processing tasks would find this useful for text preparation.

No commits in the last 6 months.

Use this if you are a Ruby developer needing to break down unstructured text, including unicode or text with simple HTML tags, into discrete sentences for further analysis.

Not ideal if you are not a Ruby developer or need very robust HTML parsing beyond basic tag recognition.

natural-language-processing text-analysis data-preprocessing information-extraction ruby-development
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 16 / 25

How are scores calculated?

Stars

80

Forks

13

Language

Ruby

License

Last pushed

Apr 30, 2014

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/zencephalon/Tactful_Tokenizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.