Synthetic Data Generation ML Frameworks

Tools and frameworks for generating synthetic datasets across tabular, time-series, and domain-specific data modalities, including benchmarking and evaluation methods. Does NOT include real dataset collections, data augmentation techniques, or domain-specific applications that use synthetic data.

There are 45 synthetic data generation frameworks tracked. 7 score above 50 (established tier). The highest-rated is Diyago/Tabular-data-generation at 63/100 with 564 stars. 1 of the top 10 are actively maintained.

Get all 45 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=synthetic-data-generation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 Diyago/Tabular-data-generation

We well know GANs for success in the realistic image generation. However,...

63
Established
2 meta-llama/synthetic-data-kit

Tool for generating high quality Synthetic datasets

63
Established
3 Data-Centric-AI-Community/ydata-synthetic

Synthetic data generators for tabular and time-series data

59
Established
4 tdspora/syngen

Open-source version of the TDspora synthetic data generation algorithm.

58
Established
5 vanderschaarlab/synthcity

A library for generating and evaluating synthetic tabular data for privacy,...

57
Established
6 always-further/deepfabric

Generate High-Quality Synthetics, Train, Measure, and Evaluate in a Single Pipeline

54
Established
7 wiseodd/generative-models

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

51
Established
8 aliseyfi75/COSCI-GAN

Codebase for "Generating multivariate time series with COmmon Source...

46
Emerging
9 shayneobrien/generative-models

Annotated, understandable, and visually interpretable PyTorch...

46
Emerging
10 tirthajyoti/Synthetic-data-gen

Various methods for generating synthetic data for data science and ML

46
Emerging
11 martinjurkovic/syntherela

A package for benchmarking synthetic relational data generation methods

46
Emerging
12 alfurka/synloc

A Python Package to Create Synthetic Tabular Data

45
Emerging
13 SAGDAfrica/sagda

Synthetic Agriculture Data for Africa

45
Emerging
14 Team-TUD/CTAB-GAN

Official git for "CTAB-GAN: Effective Table Data Synthesizing"

44
Emerging
15 federicoarenasl/sdg-engine

A simple data generation engine for computer vision, compatible with 🤗 datasets.

40
Emerging
16 ELM-Research/ECG-Neural-Networks

Research-oriented pretraining and evaluation pipelines for ECG-specific...

39
Emerging
17 stefan-jansen/synthetic-data-for-finance

Material for QuantUniversity talk on Sythetic Data Generation for Finance.

39
Emerging
18 gretelai/trainer

Simple interface to synthesize complex and highly dimensional datasets using...

39
Emerging
19 MRSAIL-Mini-Robotics-Software-AI-Lab/GANVAS-models

Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

35
Emerging
20 AlejandroBeldaFernandez/Calm-Data-Generator

CALM-Data-Generator is a comprehensive Python library for synthetic data...

34
Emerging
21 bensonlee5/dagzoo

Synthetic tabular data generator for causal modeling

34
Emerging
22 antorguez95/synthetic_data_generation_framework

This repository contains the code of our published work in IEEE JBHI. Our...

32
Emerging
23 TrevorW-code/fraud

synthetic data for ml

31
Emerging
24 Data-Centric-AI-Community/nist-crc-2023

NIST Collaborative Research Cycle on Synthetic Data. Learn about Synthetic...

30
Emerging
25 DerwenAI/kleptosyn

Synthetic data generation for investigative graphs based on patterns of...

30
Emerging
26 CFA-Institute-RPC/Synthetic-Data-For-Finance

This repository contains accompanying code for the CFA Institute's Research...

30
Emerging
27 AmirhosseinHonardoust/Autocurator-Synthetic-Data-Benchmark

Autocurator is a comprehensive benchmarking toolkit for evaluating synthetic...

30
Emerging
28 GarouMonste/Teaching-Neural-Networks-to-Imagine-Tables

🛠️ Develop a Variational Autoencoder to generate realistic tabular data,...

27
Experimental
29 EPFL-ENAC/TOPO-DataGen

[CVPR'22] TOPO-DataGen: an open and scalable aerial synthetic data...

26
Experimental
30 jaimeperezsanchez/GAN_Scenario_Forecasting

Data augmentation through multivariate scenario forecasting in Data Centers...

26
Experimental
31 ELM-Research/ecg_nn

Research-oriented pretraining and evaluation pipelines for ECG-specific...

24
Experimental
32 Rufina46/time-series-synthetic

Open-source synthetic time-series generator for ML testing

23
Experimental
33 oRyyu2703/Autocurator-Synthetic-Data-Benchmark

🔍 Evaluate synthetic data quality against real tabular datasets with...

22
Experimental
34 hipaasynth-svg/HipAAsynth

Deterministic synthetic clinical data engine. Zero dependencies. Fully reproducible.

22
Experimental
35 dataxid/dataxid-python

The Synthetic Data API. Generate privacy-safe synthetic data with 5 lines of code.

22
Experimental
36 aia39/Synthetic-Tabular-Data-Generation-using-CTGAN-and-classify-with-XGboost

This is the repository to generate synthetic tabular data when the tabular...

21
Experimental
37 abdulvahapmutlu/als-synthetic-data-augmentation-wgan

This project aims to address the lack of EEG signals for ALS (Amyotrophic...

19
Experimental
38 MongoExpUser/Synthetic-Drilling-Data-App-for-Sqlite-ML

Generate synthetic drilling data that can be used for testing machine...

18
Experimental
39 PARKCHEOLHEE-lab/papers-for-generative-design

A collection of papers, articles, and code for Generative Design

14
Experimental
40 EmrahFidan/MissingLink

Synthetic tabular data generation engine — CTGAN deep learning for CSV...

14
Experimental
41 julsngbatac/GANs-For-Synthetic-Data-Generation

🤖 Generate realistic synthetic data using GANs to boost AI model training...

14
Experimental
42 wildanjr19/generative-model

Learn and build generative model from scratch, mostly in PyTorch

13
Experimental
43 volodya7292/synthetic_data

Synthetic data generation system library.

13
Experimental
44 rishic3/GenerateEEG

Two Conditional GAN frameworks to perform synthetic EEG generation for...

12
Experimental
45 xValentim/GenerativeAlchemy

This repo will be very helpful if you like generative modeling and you want...

11
Experimental