huggingface/data-is-better-together

Let's build better datasets, together!

33
/ 100
Emerging

This initiative helps machine learning practitioners and researchers collaboratively build high-quality datasets for training and evaluating AI models. It takes raw text prompts or image generation results and, with community input, produces ranked prompts, translated prompts for different languages, or preference pairs for images. The end users are typically AI engineers, data scientists, or researchers who need diverse, high-quality data to improve their models.

271 stars. No commits in the last 6 months.

Use this if you need to create or access expertly curated and community-validated datasets for tasks like prompt engineering, multilingual LLM evaluation, or image generation model assessment.

Not ideal if you are looking for ready-to-use, off-the-shelf models or general-purpose data not related to AI model training and evaluation.

AI-training-data LLM-evaluation prompt-engineering multilingual-AI image-generation-evaluation
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

271

Forks

29

Language

Jupyter Notebook

License

Last pushed

Dec 20, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/huggingface/data-is-better-together"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.