BPE Tokenizers LLM Tools

Implementations and variants of Byte Pair Encoding (BPE) tokenizers across programming languages and scripts. Includes language-specific BPE tokenizers, optimized BPE libraries, and BPE algorithm improvements. Does NOT include other tokenization methods (SentencePiece, WordPiece) unless BPE is the primary focus, general NLP pipelines, or LLM frameworks that merely use tokenizers.

There are 8 bpe tokenizers tools tracked. The highest-rated is eliben/go-sentencepiece at 48/100 with 47 stars.

Get all 8 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=bpe-tokenizers&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 eliben/go-sentencepiece

Go implementation of the SentencePiece tokenizer

48
Emerging
2 sefineh-ai/Amharic-Tokenizer

Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast,...

45
Emerging
3 mdabir1203/BPE_Tokenizer_Visualizer

A Visualizer to check how BPE Tokenizer in an LLM Works

31
Emerging
4 BobMcDear/minbpe-hs

Byte-level byte pair encoding (BPE) in Haskell

27
Experimental
5 franciszekparma/GBPET

GPT-style language model with Byte Pair Encoding tokenizer, built from...

27
Experimental
6 sajjadh47/bpe-encoder-php

BPE (Byte-Pair Encoding) Encoder Decoder for OpenAI's GPT-2 / GPT-3...

20
Experimental
7 anperrone/minbpe

This crate is a rust porting of Andrej Karpathy implementation of Byte Pair...

17
Experimental
8 burcgokden/SentencePiece-Tokenizer-Wrapper-for-PLDR-LLM-KVG-cache

SentencePiece Tokenizer Wrapper implementation for PLDR-LLM with KV cache and G-cache

11
Experimental