NLPOptimize/flash-tokenizer

EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING

45
/ 100
Emerging

This tool helps developers and machine learning engineers who are serving large language models (LLMs) to process text input much faster. It takes raw text and quickly converts it into numerical tokens, which are the fundamental input for LLMs, delivering these tokens with high speed and accuracy. This is ideal for anyone deploying LLMs in production where inference speed is critical.

509 stars. Available on PyPI.

Use this if you need to accelerate the tokenization step when running large language models, especially when your current tokenizer is a bottleneck for performance.

Not ideal if you are only experimenting with LLMs in a development environment and performance is not a primary concern.

LLM deployment NLP inference text processing machine learning engineering model serving
No License
Maintenance 10 / 25
Adoption 10 / 25
Maturity 17 / 25
Community 8 / 25

How are scores calculated?

Stars

509

Forks

9

Language

C++

License

Last pushed

Feb 02, 2026

Commits (30d)

0

Dependencies

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NLPOptimize/flash-tokenizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.