EticaAI/linguistic-datasets-portuguese

Linguistic Datasets for Portuguese: Lista de conjuntos de dados linguísticos para língua portuguesa com licença flexíveis: banco de dados, lista de palavras, sinônimos, antônimos, dicionário temático, tesauro, linked data, semântica, ontologia e representação de conhecimento

33
/ 100
Emerging

This project provides a collection of linguistic datasets for the Portuguese language, covering various dialects like Brazilian, Angolan, Mozambican, and European Portuguese. It offers ready-to-use resources like word lists, synonym/antonym dictionaries, and thematic thesauri, designed for tasks such as spell checking, grammar correction, and text analysis. Anyone working with Portuguese text, including linguists, content creators, educators, or researchers, who needs reliable linguistic data will find this useful.

No commits in the last 6 months.

Use this if you need pre-compiled, openly licensed linguistic resources for Portuguese to enhance language-based applications, content, or research.

Not ideal if you are looking for gigabytes of raw, uncurated text data for deep learning models, as these datasets are typically smaller and highly specialized.

Portuguese-language linguistics text-analysis grammar-checking thesaurus
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

82

Forks

5

Language

License

Unlicense

Last pushed

Nov 21, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/EticaAI/linguistic-datasets-portuguese"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.