BPE Tokenizers LLM Tools
Implementations and variants of Byte Pair Encoding (BPE) tokenizers across programming languages and scripts. Includes language-specific BPE tokenizers, optimized BPE libraries, and BPE algorithm improvements. Does NOT include other tokenization methods (SentencePiece, WordPiece) unless BPE is the primary focus, general NLP pipelines, or LLM frameworks that merely use tokenizers.
There are 8 bpe tokenizers tools tracked. The highest-rated is eliben/go-sentencepiece at 48/100 with 47 stars.
Get all 8 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=bpe-tokenizers&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
eliben/go-sentencepiece
Go implementation of the SentencePiece tokenizer |
|
Emerging |
| 2 |
sefineh-ai/Amharic-Tokenizer
Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast,... |
|
Emerging |
| 3 |
mdabir1203/BPE_Tokenizer_Visualizer
A Visualizer to check how BPE Tokenizer in an LLM Works |
|
Emerging |
| 4 |
BobMcDear/minbpe-hs
Byte-level byte pair encoding (BPE) in Haskell |
|
Experimental |
| 5 |
franciszekparma/GBPET
GPT-style language model with Byte Pair Encoding tokenizer, built from... |
|
Experimental |
| 6 |
sajjadh47/bpe-encoder-php
BPE (Byte-Pair Encoding) Encoder Decoder for OpenAI's GPT-2 / GPT-3... |
|
Experimental |
| 7 |
anperrone/minbpe
This crate is a rust porting of Andrej Karpathy implementation of Byte Pair... |
|
Experimental |
| 8 |
burcgokden/SentencePiece-Tokenizer-Wrapper-for-PLDR-LLM-KVG-cache
SentencePiece Tokenizer Wrapper implementation for PLDR-LLM with KV cache and G-cache |
|
Experimental |