explosion/spacy-lookups-data
📂 Additional lookup tables and data resources for spaCy
This tool provides essential linguistic data for natural language processing tasks, especially for less common languages. It takes raw text in various languages and helps software understand word roots (lemmas) and normalize text. Developers who build custom natural language processing models or applications, particularly for languages not extensively covered by pre-trained solutions, are the primary users.
113 stars. No commits in the last 6 months.
Use this if you are a developer building a spaCy-based natural language processing application for languages like Serbian or Turkish, and need robust lemmatization or text normalization capabilities for a custom model.
Not ideal if you are an end-user without programming skills, or if you primarily work with widely supported languages like English for which pre-trained spaCy models already include comprehensive data.
Stars
113
Forks
54
Language
Python
License
MIT
Category
Last pushed
Jun 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/explosion/spacy-lookups-data"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dkpro/dkpro-cassis
UIMA CAS processing library written in Python
centre-for-humanities-computing/DaCy
DaCy: The State of the Art Danish NLP pipeline using SpaCy
explosion/spacy-loggers
📟 Logging utilities for spaCy
explosion/spacymoji
💙 Emoji handling and meta data for spaCy with custom extension attributes
JulesBelveze/concepcy
💫 SpaCy wrapper for ConceptNet 💫