liuzl/tokenizer

Natural Language Tokenizer

21
/ 100
Experimental

This is a fundamental tool for anyone working with text data who needs to break down sentences into individual words or meaningful units. It takes raw text in various languages and outputs a clean list of its constituent words, correctly handling special cases like contractions and possessives. It's designed for developers building applications that process or analyze human language.

No commits in the last 6 months.

Use this if you are a developer building a search engine, text analyzer, or any application that needs to accurately segment multilingual text into individual words.

Not ideal if you need advanced natural language processing features beyond basic word segmentation, such as sentiment analysis or part-of-speech tagging.

natural-language-processing text-analysis search-engine-development multilingual-text information-retrieval
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

10

Forks

Language

Go

License

Apache-2.0

Category

go-nlp-libraries

Last pushed

Nov 28, 2018

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/liuzl/tokenizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.