tokenizers and tftokenizers
The first is a core tokenization library that the second wraps as TensorFlow SavedModels for serving, making them complements rather than competitors.
About tokenizers
huggingface/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
When working with large volumes of text for natural language processing, this tool helps you convert raw text into a format that machine learning models can understand. It takes your raw text documents as input and produces a 'vocabulary' and 'tokens'—which are numerical representations of words or sub-word units. This is essential for AI researchers and machine learning engineers building or fine-tuning language models.
About tftokenizers
Hugging-Face-Supporter/tftokenizers
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
This tool helps machine learning engineers package Hugging Face tokenizers with TensorFlow models into a single, portable SavedModel. You input a Hugging Face model and tokenizer, and it outputs a self-contained TensorFlow SavedModel. This is used by developers deploying natural language processing models into production TensorFlow environments.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work