EticaAI/linguistic-datasets-portuguese
Linguistic Datasets for Portuguese: Lista de conjuntos de dados linguísticos para língua portuguesa com licença flexíveis: banco de dados, lista de palavras, sinônimos, antônimos, dicionário temático, tesauro, linked data, semântica, ontologia e representação de conhecimento
This project provides a collection of linguistic datasets for the Portuguese language, covering various dialects like Brazilian, Angolan, Mozambican, and European Portuguese. It offers ready-to-use resources like word lists, synonym/antonym dictionaries, and thematic thesauri, designed for tasks such as spell checking, grammar correction, and text analysis. Anyone working with Portuguese text, including linguists, content creators, educators, or researchers, who needs reliable linguistic data will find this useful.
No commits in the last 6 months.
Use this if you need pre-compiled, openly licensed linguistic resources for Portuguese to enhance language-based applications, content, or research.
Not ideal if you are looking for gigabytes of raw, uncurated text data for deep learning models, as these datasets are typically smaller and highly specialized.
Stars
82
Forks
5
Language
—
License
Unlicense
Category
Last pushed
Nov 21, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/EticaAI/linguistic-datasets-portuguese"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
thalesbertaglia/enelvo
A flexible normalizer for user-generated content
meedan/alegre
A text and media analysis service for Meedan Check, a collaborative media annotation platform
alan-barzilay/NLPortugues
NLPortuguês - Aprenda PLN em português! Esse repositório contem os materiais e exercícios do...
ulysses-camara/ulysses-segmenter
Pretrained segmenter models for Portuguese legislative text.
elenderg/PAL-1000
Ambiente de Desenvolvimento Integrado (ADI) contando com um explorador de arquivos, editor de...