Renumics/awesome-open-data-centric-ai

Curated list of open source tooling for data-centric AI on unstructured data.

40
/ 100
Emerging

This curated list helps AI practitioners find open-source tools to improve their unstructured datasets for machine learning models. It takes in information about common data challenges (like noise or bias in images, audio, video, time-series, or text) and helps you discover solutions to systematically refine your training data, leading to better-performing AI systems. This is for anyone building or improving AI models using real-world data.

734 stars. No commits in the last 6 months.

Use this if you are building an AI system and need to find open-source tools to systematically improve the quality of your unstructured training data like images, audio, or text.

Not ideal if you are working with tabular data, primarily seeking dedicated data labeling tools, or looking for MLOps infrastructure.

AI development machine learning data quality unstructured data dataset improvement
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

734

Forks

38

Language

License

CC-BY-4.0

Last pushed

Nov 15, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Renumics/awesome-open-data-centric-ai"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.