liuzl/tokenizer
Natural Language Tokenizer
This is a fundamental tool for anyone working with text data who needs to break down sentences into individual words or meaningful units. It takes raw text in various languages and outputs a clean list of its constituent words, correctly handling special cases like contractions and possessives. It's designed for developers building applications that process or analyze human language.
No commits in the last 6 months.
Use this if you are a developer building a search engine, text analyzer, or any application that needs to accurately segment multilingual text into individual words.
Not ideal if you need advanced natural language processing features beyond basic word segmentation, such as sentiment analysis or part-of-speech tagging.
Stars
10
Forks
—
Language
Go
License
Apache-2.0
Category
Last pushed
Nov 28, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/liuzl/tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ikawaha/kagome-dict
Dictionary Library for Kagome v2
aaaton/golem
A lemmatizer implemented in Go
habeanf/yap
Yet Another (natural language) Parser
clipperhouse/uax29
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split graphemes, words, sentences.
abadojack/whatlanggo
Natural language detection library for Go