fnl/syntok

Text tokenization and sentence segmentation (segtok v2)

45
/ 100
Emerging

This tool helps you automatically break down long pieces of text into individual sentences and words. It takes your raw text documents, like reports or articles, and outputs a structured list of sentences, with each sentence further broken into its constituent words and punctuation. Anyone working with text data, such as researchers, data analysts, or content strategists, who needs to prepare text for further analysis would find this useful.

209 stars. No commits in the last 6 months.

Use this if you need to precisely split text written in English, Spanish, or German into clean sentences and individual words for natural language processing or text analysis.

Not ideal if your primary need is to process text in languages other than English, Spanish, or German, as its specialized accuracy might not apply.

text-analysis content-preparation linguistics information-extraction document-processing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

209

Forks

35

Language

Python

License

MIT

Last pushed

Mar 12, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/fnl/syntok"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.