huggingface/tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

/ 100

Verified

When working with large volumes of text for natural language processing, this tool helps you convert raw text into a format that machine learning models can understand. It takes your raw text documents as input and produces a 'vocabulary' and 'tokens'—which are numerical representations of words or sub-word units. This is essential for AI researchers and machine learning engineers building or fine-tuning language models.

10,520 stars and 1,504,044 monthly downloads. Used by 127 other packages. Actively maintained with 45 commits in the last 30 days. Available on PyPI and npm.

Use this if you need to quickly and efficiently prepare large text datasets for training or using state-of-the-art natural language processing models.

Not ideal if your primary goal is basic text analysis without the need for advanced machine learning model input preparation.

natural-language-processing machine-learning-engineering text-pre-processing AI-model-training

Maintenance 20 / 25

Adoption 25 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

10,520

Forks

1,051

Language

Rust

License

Apache-2.0

Featured in

We Audited crewAI's AI Dependencies: Here's What the Data Says

Recent Releases

v0.22.2 02 Dec 2025 v0.22.1 19 Sep 2025 v0.22.0 29 Aug 2025 v0.21.4 28 Jul 2025 v0.21.3 04 Jul 2025

Compare

tokenizers and tftokenizers tokenizers and gotokenizers tokenizers and libtokenizers tokenizers and language-tokenizer tokenizers and azerbaijani-tokenizer

Related models

megagonlabs/ginza-transformers

Use custom tokenizers in spacy-transformers

Kaleidophon/token2index

A lightweight but powerful library to build token indices for NLP tasks, compatible with major...

Hugging-Face-Supporter/tftokenizers

Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels

NVIDIA/Cosmos-Tokenizer

A suite of image and video neural tokenizers

wangcongcong123/ttt

A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+

Explore Transformer Models

All categories Trending Transformer directory Insights