euskadi31/go-tokenizer
A Text Tokenizer library for Golang
This is a fundamental tool for Go developers working with text. It takes a block of text and breaks it down into individual words or meaningful units, removing punctuation and separating contractions. Developers building applications that process or analyze human language, like search engines, chatbots, or content analysis tools, will find this project useful.
Use this if you are a Go developer who needs to programmatically break down sentences and paragraphs into their constituent words for further processing.
Not ideal if you need advanced natural language processing features like sentiment analysis, named entity recognition, or part-of-speech tagging, as this tool focuses solely on tokenization.
Stars
11
Forks
2
Language
Go
License
MIT
Category
Last pushed
Nov 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/euskadi31/go-tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
guillaume-be/rust-tokenizers
Rust-tokenizer offers high-performance tokenizers for modern language models, including...
sugarme/tokenizer
NLP tokenizers written in Go language
elixir-nx/tokenizers
Elixir bindings for 🤗 Tokenizers
openscilab/tocount
ToCount: Lightweight Token Estimator
reinfer/blingfire-rs
Rust wrapper for the BlingFire tokenization library