Synthetic Data Generation Transformer Models
There are 13 synthetic data generation models tracked. The highest-rated is VikParuchuri/textbook_quality at 43/100 with 509 stars.
Get all 13 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=synthetic-data-generation&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
VikParuchuri/textbook_quality
Generate textbook-quality synthetic LLM pretraining data |
|
Emerging |
| 2 |
dmanuel64/codablellm
A framework for creating and curating high-quality code datasets tailored... |
|
Emerging |
| 3 |
BhabhaAI/dataformer
Solving data for LLMs - Create quality synthetic datasets! |
|
Emerging |
| 4 |
BothBosu/Synthetic-Data-for-Scam-Detection-Leveraging-LLMs-to-Train-Deep-Learning-Models
This repository contains the source code and synthetic datasets used in the... |
|
Emerging |
| 5 |
iiis-ai/TemplateMath
[ICLR 2025 DATA-FM] Training and Evaluating Language Models with... |
|
Experimental |
| 6 |
MichiganNLP/depression_synthetic_data
Can LMs generate useful synthetic data for the mental health domain? |
|
Experimental |
| 7 |
nphdang/Pred-LLM
Generating tabular data via Large Language Models (LLMs) |
|
Experimental |
| 8 |
ZEKE320/llm-dataset-generator
The LLM Dataset Generator is an open source tool for generating text data... |
|
Experimental |
| 9 |
daspartho/DistillClassifier
Easily generate synthetic data for classification tasks using LLMs |
|
Experimental |
| 10 |
AikyamLab/regtext
A framework to generate unlearnable text data |
|
Experimental |
| 11 |
benjaminr/gendantic
Generate synthetic data using Pydantic Models and LLMs |
|
Experimental |
| 12 |
windblow32/DATE
Exploring the Heterogeneity of Tabular Data: A Diversity-aware Data... |
|
Experimental |
| 13 |
dannylee1020/pyper
Synthetic data generation for LLM instruction tuning |
|
Experimental |