NLPOptimize/flash-tokenizer

EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING

/ 100

Emerging

This tool helps developers and machine learning engineers who are serving large language models (LLMs) to process text input much faster. It takes raw text and quickly converts it into numerical tokens, which are the fundamental input for LLMs, delivering these tokens with high speed and accuracy. This is ideal for anyone deploying LLMs in production where inference speed is critical.

509 stars. Available on PyPI.

Use this if you need to accelerate the tokenization step when running large language models, especially when your current tokenizer is a bottleneck for performance.

Not ideal if you are only experimenting with LLMs in a development environment and performance is not a primary concern.

LLM deployment NLP inference text processing machine learning engineering model serving

No License

Maintenance 10 / 25

Adoption 10 / 25

Maturity 17 / 25

Community 8 / 25

How are scores calculated?

Stars

509

Forks

Language

C++

License

—

Related models

bminixhofer/tokenkit

A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.

briesearch/token-masks

Masked language model with Positional & One-Hot encoding - built using Aurora

Explore Transformer Models

All categories Trending Transformer directory Insights