Itachi-Uchiha581/Auto-Data
Auto Data is a library designed for quick and effortless creation of datasets tailored for fine-tuning Large Language Models (LLMs).
This tool helps developers and AI practitioners create custom datasets to train or fine-tune Large Language Models (LLMs). You input a topic, desired output format (JSON, Parquet), and an optional system prompt, and it generates realistic conversation data. This helps overcome the common challenge of scarce or imbalanced data when building specialized AI assistants or agents.
106 stars. No commits in the last 6 months.
Use this if you need to quickly generate high-quality, topic-specific conversation data for fine-tuning Large Language Models.
Not ideal if you are looking for a tool to process or analyze existing datasets, rather than generate new ones.
Stars
106
Forks
9
Language
Python
License
GPL-3.0
Category
Last pushed
Oct 31, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Itachi-Uchiha581/Auto-Data"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
InternScience/GraphGen
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
timothepearce/synda
A CLI for generating synthetic data
rasinmuhammed/misata
High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized...
ziegler-ingo/CRAFT
[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...
ZhuLinsen/FastDatasets
A powerful tool for creating high-quality training datasets for Large Language Models...