BhabhaAI/dataformer

Solving data for LLMs - Create quality synthetic datasets!

38
/ 100
Emerging

This project helps AI engineers efficiently create large, high-quality synthetic datasets to train their AI models. It takes a small set of instructions or examples and generates diverse, production-ready data, helping to reduce compute costs and improve model performance. It is designed for AI developers and machine learning engineers who need to quickly generate data without relying on extensive real-world datasets.

151 stars. No commits in the last 6 months.

Use this if you are an AI engineer who needs to rapidly produce high-quality synthetic data to train and fine-tune your large language models.

Not ideal if you are looking for a tool to process and clean existing real-world datasets rather than generate new ones.

AI development Machine learning engineering LLM training Data generation Synthetic data
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

151

Forks

12

Language

Python

License

Apache-2.0

Last pushed

Jan 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/BhabhaAI/dataformer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.