Synthetic Data Generation Transformer Models

There are 13 synthetic data generation models tracked. The highest-rated is VikParuchuri/textbook_quality at 43/100 with 509 stars.

Get all 13 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=synthetic-data-generation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 VikParuchuri/textbook_quality

Generate textbook-quality synthetic LLM pretraining data

43
Emerging
2 dmanuel64/codablellm

A framework for creating and curating high-quality code datasets tailored...

42
Emerging
3 BhabhaAI/dataformer

Solving data for LLMs - Create quality synthetic datasets!

38
Emerging
4 BothBosu/Synthetic-Data-for-Scam-Detection-Leveraging-LLMs-to-Train-Deep-Learning-Models

This repository contains the source code and synthetic datasets used in the...

32
Emerging
5 iiis-ai/TemplateMath

[ICLR 2025 DATA-FM] Training and Evaluating Language Models with...

25
Experimental
6 MichiganNLP/depression_synthetic_data

Can LMs generate useful synthetic data for the mental health domain?

22
Experimental
7 nphdang/Pred-LLM

Generating tabular data via Large Language Models (LLMs)

22
Experimental
8 ZEKE320/llm-dataset-generator

The LLM Dataset Generator is an open source tool for generating text data...

20
Experimental
9 daspartho/DistillClassifier

Easily generate synthetic data for classification tasks using LLMs

18
Experimental
10 AikyamLab/regtext

A framework to generate unlearnable text data

18
Experimental
11 benjaminr/gendantic

Generate synthetic data using Pydantic Models and LLMs

14
Experimental
12 windblow32/DATE

Exploring the Heterogeneity of Tabular Data: A Diversity-aware Data...

12
Experimental
13 dannylee1020/pyper

Synthetic data generation for LLM instruction tuning

11
Experimental