NLPOptimize/flash-tokenizer
EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING
This tool helps developers and machine learning engineers who are serving large language models (LLMs) to process text input much faster. It takes raw text and quickly converts it into numerical tokens, which are the fundamental input for LLMs, delivering these tokens with high speed and accuracy. This is ideal for anyone deploying LLMs in production where inference speed is critical.
509 stars. Available on PyPI.
Use this if you need to accelerate the tokenization step when running large language models, especially when your current tokenizer is a bottleneck for performance.
Not ideal if you are only experimenting with LLMs in a development environment and performance is not a primary concern.
Stars
509
Forks
9
Language
C++
License
—
Category
Last pushed
Feb 02, 2026
Commits (30d)
0
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NLPOptimize/flash-tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.