Alannikos/edg4llm

A unified tool to generate fine-tuning datasets for LLMs, including questions, answers, and dialogues. ✨🤖📚💬

/ 100

Emerging

This tool helps AI engineers and machine learning practitioners create high-quality training datasets for large language models (LLMs). You provide a prompt or existing data, and it generates additional questions, answers, or dialogue suitable for fine-tuning your LLM, improving its performance on specific tasks.

No commits in the last 6 months. Available on PyPI.

Use this if you need to efficiently generate diverse text-based datasets, such as questions, answers, or dialogues, to fine-tune your large language model for better performance.

Not ideal if you need to generate non-textual data, or if you are looking for a pre-trained model rather than a tool to create training data for your own model.

LLM fine-tuning AI data generation natural language processing machine learning engineering dialogue system training

Stale 6m

Maintenance 0 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

InternScience/GraphGen

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

timothepearce/synda

A CLI for generating synthetic data

rasinmuhammed/misata

High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized...

ziegler-ingo/CRAFT

[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...

ZhuLinsen/FastDatasets

A powerful tool for creating high-quality training datasets for Large Language Models...

Explore LLM Tools

All categories Trending LLM Tool directory Insights