kevinscaria/TarGEN
Targeted Data Generation with Large Language Models
This project helps AI/ML researchers and data scientists generate specific types of synthetic text data for training and evaluating large language models. You provide a description of the desired data style and a language model, and it outputs new, tailored datasets. It's designed for those who need custom, controlled text data beyond what's publicly available.
No commits in the last 6 months.
Use this if you need to create targeted synthetic datasets for specific natural language understanding tasks, especially when real-world data is scarce or challenging to obtain.
Not ideal if you're looking for a no-code solution or a tool for general-purpose text generation without specific data style requirements.
Stars
19
Forks
3
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jun 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/kevinscaria/TarGEN"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
InternScience/GraphGen
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
timothepearce/synda
A CLI for generating synthetic data
rasinmuhammed/misata
High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized...
ziegler-ingo/CRAFT
[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...
ZhuLinsen/FastDatasets
A powerful tool for creating high-quality training datasets for Large Language Models...