megagonlabs/ginza-transformers

Use custom tokenizers in spacy-transformers

46
/ 100
Emerging

This project helps developers working with natural language processing (NLP) to integrate specialized or custom text segmentation tools with their spaCy v3 transformer models. It allows you to use tokenizers not directly from Hugging Face Transformers, ensuring your models process text with the exact word and subword divisions required for specific languages or domains. Developers building custom NLP pipelines for unique text structures would use this.

No commits in the last 6 months. Available on PyPI.

Use this if you need to use a custom text tokenizer with your spaCy v3 transformer pipeline that isn't available directly through Hugging Face's default library.

Not ideal if your NLP workflow relies solely on standard tokenizers already supported by Hugging Face Transformers and spaCy.

Natural Language Processing NLP Development Text Analysis Custom Tokenization Transformer Models
Stale 6m
Maintenance 0 / 25
Adoption 6 / 25
Maturity 25 / 25
Community 15 / 25

How are scores calculated?

Stars

16

Forks

5

Language

Python

License

MIT

Last pushed

Aug 09, 2022

Commits (30d)

0

Dependencies

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/megagonlabs/ginza-transformers"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.