huggingface/data-is-better-together

Let's build better datasets, together!

/ 100

Emerging

This initiative helps machine learning practitioners and researchers collaboratively build high-quality datasets for training and evaluating AI models. It takes raw text prompts or image generation results and, with community input, produces ranked prompts, translated prompts for different languages, or preference pairs for images. The end users are typically AI engineers, data scientists, or researchers who need diverse, high-quality data to improve their models.

271 stars. No commits in the last 6 months.

Use this if you need to create or access expertly curated and community-validated datasets for tasks like prompt engineering, multilingual LLM evaluation, or image generation model assessment.

Not ideal if you are looking for ready-to-use, off-the-shelf models or general-purpose data not related to AI model training and evaluation.

AI-training-data LLM-evaluation prompt-engineering multilingual-AI image-generation-evaluation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

271

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

GoogleCloudPlatform/data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan,...

rjurney/Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

linogaliana/python-datascientist

Dépôt associé au cours Python pour data scientists (ENSAE 2e année)

yogeshhk/TeachingDataScience

Course notes for Data Science related topics, prepared in LaTeX

PacktWorkshops/The-Data-Science-Workshop

A New, Interactive Approach to Learning Data Science

Explore ML Frameworks

All categories Trending ML Framework directory Insights