sugarme/tokenizer
NLP tokenizers written in Go language
This tool helps Go language developers prepare text data for Natural Language Processing (NLP) models. It takes raw text input and breaks it down into individual words or sub-word units, along with their positions, making it ready for use in machine learning models. It's designed for Go developers who are building AI/deep-learning applications.
316 stars.
Use this if you are a Go developer building NLP applications and need to preprocess text by converting it into tokens for tasks like training or inference.
Not ideal if you are not a Go developer or if you need a pre-built NLP solution without needing to integrate a tokenizer into your Go application.
Stars
316
Forks
61
Language
Go
License
Apache-2.0
Category
Last pushed
Nov 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/sugarme/tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
guillaume-be/rust-tokenizers
Rust-tokenizer offers high-performance tokenizers for modern language models, including...
elixir-nx/tokenizers
Elixir bindings for 🤗 Tokenizers
openscilab/tocount
ToCount: Lightweight Token Estimator
reinfer/blingfire-rs
Rust wrapper for the BlingFire tokenization library
Scurrra/ubpe
Universal (general sequence) Byte-Pair Encoding