Bpe Tokenizers NLP Tools
There are 12 bpe tokenizers tools tracked. The highest-rated is georg-jung/FastBertTokenizer at 47/100 with 53 stars.
Get all 12 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=bpe-tokenizers&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
georg-jung/FastBertTokenizer
Fast and memory-efficient library for WordPiece tokenization as it is used by BERT. |
|
Emerging |
| 2 |
ml-rust/splintr
A high-performance tokenizer (BPE + SentencePiece) built with Rust with... |
|
Emerging |
| 3 |
sanderland/script_tok
Code for the paper "BPE stays on SCRIPT" |
|
Emerging |
| 4 |
ash-01xor/bpe.c
Simple Byte pair Encoding mechanism used for tokenization process . written... |
|
Emerging |
| 5 |
U4RASD/r-bpe
R-BPE: Improving BPE-Tokenizers with Token Reuse |
|
Emerging |
| 6 |
jmaczan/bpe-tokenizer
Byte-Pair Encoding tokenizer for training large language models on huge datasets |
|
Emerging |
| 7 |
vforteli/WordPieceTokenizer
WordPiece tokenizer for dotnet (eg with ML.Net) |
|
Experimental |
| 8 |
deepanprabhu/fastbpe
Java library implementing Byte-Pair Encoding Tokenization |
|
Experimental |
| 9 |
BlackNinjaKR/BPE_BytePairEncoding
An implementation of Byte Pair Encoding (BPE) |
|
Experimental |
| 10 |
jmaczan/bpe.c
High performance Byte-Pair Encoding tokenizer for large language models |
|
Experimental |
| 11 |
swanshiv/varna_marathi_tokenizer
From-scratch Marathi BPE tokenizer with Flask API and web interface for... |
|
Experimental |
| 12 |
burcgokden/Sentencepiece-Tokenizer-Wrapper-for-PLDR-LLM
A framework for building Sentencepiece tokenizer from a dataset |
|
Experimental |