Kaleidophon/token2index
A lightweight but powerful library to build token indices for NLP tasks, compatible with major Deep Learning frameworks like PyTorch and Tensorflow.
This tool helps machine learning engineers and NLP researchers efficiently convert text into numerical representations, a crucial step for building natural language processing models. You provide a body of text or a vocabulary file, and it outputs a consistent mapping of words to numbers. It's designed for developers building deep learning models with frameworks like PyTorch or TensorFlow.
No commits in the last 6 months. Available on PyPI.
Use this if you are a machine learning engineer or NLP researcher who needs a reliable and efficient way to prepare textual data for deep learning models by mapping words to unique numerical indices.
Not ideal if you are looking for a pre-trained NLP model or a high-level API for tasks like sentiment analysis or text summarization.
Stars
51
Forks
6
Language
Python
License
GPL-3.0
Category
Last pushed
Dec 06, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Kaleidophon/token2index"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
huggingface/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
megagonlabs/ginza-transformers
Use custom tokenizers in spacy-transformers
Hugging-Face-Supporter/tftokenizers
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
NVIDIA/Cosmos-Tokenizer
A suite of image and video neural tokenizers
wangcongcong123/ttt
A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+