Text Tokenization Libraries ML Frameworks
Language processing tools that convert text into tokens for NLP and ML models. Includes tokenizers across multiple programming languages and implementations. Does NOT include general text processing, speech tokenization, or vectorization/embedding systems.
There are 15 text tokenization libraries frameworks tracked. The highest-rated is SauravP97/hf-tokenizer-visualizer at 32/100 with 2 stars.
Get all 15 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=text-tokenization-libraries&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
SauravP97/hf-tokenizer-visualizer
Visualize HuggingFace Byte-Pair Encoding (BPE) Tokenizer encoding process |
|
Emerging |
| 2 |
twinnydotdev/toxe
SentencePiece tokenizer for cross-encoders |
|
Emerging |
| 3 |
jawrainey/hfta
Reference implementation: run any huggingface tokenizer in Android (rust). |
|
Experimental |
| 4 |
andikaseptiadi/local-code-model
🛠️ Build a pure Go GPT-style transformer from scratch to grasp the... |
|
Experimental |
| 5 |
Example69420/splintr
🚀 Boost text processing speed with Splintr, a high-performance BPE tokenizer... |
|
Experimental |
| 6 |
jrajath94/bpe-tokenizer
BPE and WordPiece tokenization from scratch — clean implementations that... |
|
Experimental |
| 7 |
DHRUVCHARNE/bpe-tokenizer-ts
From-scratch Byte Pair Encoding (BPE) tokenizer in TypeScript using Bun |
|
Experimental |
| 8 |
C4AI/token-counter
Python library + CLI to count dataset tokens with HF tokenizers and export... |
|
Experimental |
| 9 |
Catmono/bpe-tokenizer-ts
🧠 Build and explore a minimal Byte Pair Encoding tokenizer in TypeScript,... |
|
Experimental |
| 10 |
unixpickle/tweetenc
An auto-encoder for tweets |
|
Experimental |
| 11 |
toprakdeviren/gpu-bpe
GPU-accelerated Byte Pair Encoding in the browser via WebGPU compute shaders |
|
Experimental |
| 12 |
sumony2j/Simple-BPE-Tokenizer
A pure Python implementation of Byte Pair Encoding (BPE) tokenizer. Train on... |
|
Experimental |
| 13 |
mridulsaklani/My_Tokenizer
It is a small model of tokenizer also used by every AI GPT's model to... |
|
Experimental |
| 14 |
b0o/tiktoken-bench
A small Node.js benchmark suite for the tiktoken WASM port. |
|
Experimental |
| 15 |
pointlander/txt
A natural language model based on context mixing |
|
Experimental |