explosion/spacy-lookups-data

📂 Additional lookup tables and data resources for spaCy

/ 100

Emerging

This tool provides essential linguistic data for natural language processing tasks, especially for less common languages. It takes raw text in various languages and helps software understand word roots (lemmas) and normalize text. Developers who build custom natural language processing models or applications, particularly for languages not extensively covered by pre-trained solutions, are the primary users.

113 stars. No commits in the last 6 months.

Use this if you are a developer building a spaCy-based natural language processing application for languages like Serbian or Turkish, and need robust lemmatization or text normalization capabilities for a custom model.

Not ideal if you are an end-user without programming skills, or if you primarily work with widely supported languages like English for which pre-trained spaCy models already include comprehensive data.

natural-language-processing custom-model-training linguistic-data text-normalization lemmatization

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

113

Forks

Language

Python

License

MIT

Higher-rated alternatives

dkpro/dkpro-cassis

UIMA CAS processing library written in Python

centre-for-humanities-computing/DaCy

DaCy: The State of the Art Danish NLP pipeline using SpaCy

explosion/spacy-loggers

📟 Logging utilities for spaCy

explosion/spacymoji

💙 Emoji handling and meta data for spaCy with custom extension attributes

JulesBelveze/concepcy

💫 SpaCy wrapper for ConceptNet 💫

Explore NLP Tools

All categories Trending NLP directory Insights