megagonlabs/ginza-transformers

Use custom tokenizers in spacy-transformers

/ 100

Emerging

This project helps developers working with natural language processing (NLP) to integrate specialized or custom text segmentation tools with their spaCy v3 transformer models. It allows you to use tokenizers not directly from Hugging Face Transformers, ensuring your models process text with the exact word and subword divisions required for specific languages or domains. Developers building custom NLP pipelines for unique text structures would use this.

No commits in the last 6 months. Available on PyPI.

Use this if you need to use a custom text tokenizer with your spaCy v3 transformer pipeline that isn't available directly through Hugging Face's default library.

Not ideal if your NLP workflow relies solely on standard tokenizers already supported by Hugging Face Transformers and spaCy.

Natural Language Processing NLP Development Text Analysis Custom Tokenization Transformer Models

Stale 6m

Maintenance 0 / 25

Adoption 6 / 25

Maturity 25 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

We Audited crewAI's AI Dependencies: Here's What the Data Says

Higher-rated alternatives

huggingface/tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Kaleidophon/token2index

A lightweight but powerful library to build token indices for NLP tasks, compatible with major...

Hugging-Face-Supporter/tftokenizers

Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels

NVIDIA/Cosmos-Tokenizer

A suite of image and video neural tokenizers

wangcongcong123/ttt

A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+

Explore Transformer Models

All categories Trending Transformer directory Insights