explosion/spacy-lookups-data

📂 Additional lookup tables and data resources for spaCy

49
/ 100
Emerging

This tool provides essential linguistic data for natural language processing tasks, especially for less common languages. It takes raw text in various languages and helps software understand word roots (lemmas) and normalize text. Developers who build custom natural language processing models or applications, particularly for languages not extensively covered by pre-trained solutions, are the primary users.

113 stars. No commits in the last 6 months.

Use this if you are a developer building a spaCy-based natural language processing application for languages like Serbian or Turkish, and need robust lemmatization or text normalization capabilities for a custom model.

Not ideal if you are an end-user without programming skills, or if you primarily work with widely supported languages like English for which pre-trained spaCy models already include comprehensive data.

natural-language-processing custom-model-training linguistic-data text-normalization lemmatization
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

113

Forks

54

Language

Python

License

MIT

Last pushed

Jun 04, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/explosion/spacy-lookups-data"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.