tokenizers and tftokenizers

The first is a core tokenization library that the second wraps as TensorFlow SavedModels for serving, making them complements rather than competitors.

tokenizers

Verified

tftokenizers

Emerging

Maintenance 20/25

Adoption 25/25

Maturity 25/25

Community 20/25

Maintenance 0/25

Adoption 5/25

Maturity 25/25

Community 15/25

Stars: 10,520

Forks: 1,051

Downloads: 1,504,044

Commits (30d): 45

Language: Rust

License: Apache-2.0

Stars: 10

Forks: 4

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No risk flags

Stale 6m

About tokenizers

huggingface/tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

When working with large volumes of text for natural language processing, this tool helps you convert raw text into a format that machine learning models can understand. It takes your raw text documents as input and produces a 'vocabulary' and 'tokens'—which are numerical representations of words or sub-word units. This is essential for AI researchers and machine learning engineers building or fine-tuning language models.

natural-language-processing machine-learning-engineering text-pre-processing AI-model-training

About tftokenizers

Hugging-Face-Supporter/tftokenizers

Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels

This tool helps machine learning engineers package Hugging Face tokenizers with TensorFlow models into a single, portable SavedModel. You input a Hugging Face model and tokenizer, and it outputs a self-contained TensorFlow SavedModel. This is used by developers deploying natural language processing models into production TensorFlow environments.

natural-language-processing machine-learning-deployment text-tokenization deep-learning-inference model-serving

Related comparisons

tokenizers and gotokenizers tokenizers and libtokenizers tokenizers and language-tokenizer tokenizers and azerbaijani-tokenizer

Scores updated daily from GitHub, PyPI, and npm data. How scores work