kevinscaria/TarGEN

Targeted Data Generation with Large Language Models

/ 100

Emerging

This project helps AI/ML researchers and data scientists generate specific types of synthetic text data for training and evaluating large language models. You provide a description of the desired data style and a language model, and it outputs new, tailored datasets. It's designed for those who need custom, controlled text data beyond what's publicly available.

No commits in the last 6 months.

Use this if you need to create targeted synthetic datasets for specific natural language understanding tasks, especially when real-world data is scarce or challenging to obtain.

Not ideal if you're looking for a no-code solution or a tool for general-purpose text generation without specific data style requirements.

AI-research NLP-data-generation LLM-fine-tuning synthetic-data dataset-creation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

InternScience/GraphGen

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

timothepearce/synda

A CLI for generating synthetic data

rasinmuhammed/misata

High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized...

ziegler-ingo/CRAFT

[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...

ZhuLinsen/FastDatasets

A powerful tool for creating high-quality training datasets for Large Language Models...

Explore LLM Tools

All categories Trending LLM Tool directory Insights