huggingface/data-is-better-together
Let's build better datasets, together!
This initiative helps machine learning practitioners and researchers collaboratively build high-quality datasets for training and evaluating AI models. It takes raw text prompts or image generation results and, with community input, produces ranked prompts, translated prompts for different languages, or preference pairs for images. The end users are typically AI engineers, data scientists, or researchers who need diverse, high-quality data to improve their models.
271 stars. No commits in the last 6 months.
Use this if you need to create or access expertly curated and community-validated datasets for tasks like prompt engineering, multilingual LLM evaluation, or image generation model assessment.
Not ideal if you are looking for ready-to-use, off-the-shelf models or general-purpose data not related to AI model training and evaluation.
Stars
271
Forks
29
Language
Jupyter Notebook
License
—
Category
Last pushed
Dec 20, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/huggingface/data-is-better-together"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
GoogleCloudPlatform/data-science-on-gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan,...
rjurney/Agile_Data_Code_2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
linogaliana/python-datascientist
Dépôt associé au cours Python pour data scientists (ENSAE 2e année)
yogeshhk/TeachingDataScience
Course notes for Data Science related topics, prepared in LaTeX
PacktWorkshops/The-Data-Science-Workshop
A New, Interactive Approach to Learning Data Science