AndyTheFactory/romanian-nlp-datasets

A list of Romanian NLP Datasets

46
/ 100
Emerging

This is a curated collection of diverse Romanian language datasets. It provides text ranging from legislative documents and news articles to social media posts and literary works. If you are a researcher, data scientist, or developer working on Romanian natural language processing, this resource helps you find the right data for tasks like sentiment analysis, named entity recognition, or summarization.

Use this if you need pre-compiled, open-source Romanian text data for training or evaluating AI models that understand and process the Romanian language.

Not ideal if you are looking for parallel corpora (text translated into multiple languages) or an API for real-time text processing.

Romanian language text analysis data collection AI model training linguistics
No Package No Dependents
Maintenance 6 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

56

Forks

10

Language

License

CC0-1.0

Last pushed

Dec 02, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/AndyTheFactory/romanian-nlp-datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.