Synthetic Data Generation Generative AI Tools

Tools for generating synthetic tabular, time-series, and structured data with focus on fidelity, privacy, and utility evaluation. Includes SDV frameworks, GANs, diffusion models, and benchmarking suites. Does NOT include general data augmentation for NLP/NER tasks or domain-specific synthetic generation (clinical data, images, audio).

There are 90 synthetic data generation tools tracked. 1 score above 70 (verified tier). The highest-rated is sdv-dev/SDV at 81/100 with 3,439 stars. 2 of the top 10 are actively maintained.

Get all 90 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=generative-ai&subcategory=synthetic-data-generation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 sdv-dev/SDV

Synthetic data generation for tabular data

81
Verified
2 sdv-dev/SDGym

Benchmarking synthetic data generation methods.

69
Established
3 NVIDIA-NeMo/DataDesigner

🎨 NeMo Data Designer: A general library for generating high-quality...

62
Established
4 AlexanderVNikitin/tsgm

Generation and evaluation of synthetic time series datasets (also,...

61
Established
5 mostly-ai/mostlyai

Synthetic Data SDK ✨

61
Established
6 hitsz-ids/synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured...

59
Established
7 wwhenxuan/S2Generator

A series-symbol (S2) dual-modality data generation mechanism, enabling the...

54
Established
8 microsoft/TimeCraft

Official code for TimeCraft: A Time Series Generation Framework for...

52
Established
9 microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation...

51
Established
10 aiim-research/GRETEL

GRETEL is a framework for the development and evaluation of Counterfactual...

47
Emerging
11 gretelai/gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring...

47
Emerging
12 sebhaan/TabPFGen

TabPFGen: Synthetic Tabular Data Generation with TabPFN

46
Emerging
13 nhatkhangcs/synthetic_generator

Synthetic Data Generator for Machine Learning Pipelines

45
Emerging
14 kayua/MalDataGen

MalDataGen is an advanced Python framework for generating and evaluating...

43
Emerging
15 highfem/tqdne

Generative modeling of seismic waveforms

43
Emerging
16 telmomenezes/synthetic

Symbolic Generators for Complex Networks

41
Emerging
17 KodCode-AI/kodcode

✨ A synthetic dataset generation framework that produces diverse coding...

40
Emerging
18 SilenceX12138/TabEval

📐 A comprehensive Python framework for evaluating tabular data.

40
Emerging
19 Gurobi/gurobi-ai-modeling

Generative AI for Mathematical Modeling

39
Emerging
20 shadowboxingskills/ppchain

Your Probabilistic Modeling Copilot

39
Emerging
21 Shekswess/synthgenai

SynthGenAI - Package for Generating Synthetic Datasets using LLMs.

39
Emerging
22 Clearbox-AI/clearbox-synthetic-kit

Clearbox AI's all-in-one solution for generation and evaluation of synthetic...

38
Emerging
23 mims-harvard/CLEF

Controllable Sequence Editing for Counterfactual Generation

38
Emerging
24 iperov/SSHG

Simple Synthetic Head Generator

37
Emerging
25 SilenceX12138/TabStruct

🗼 [ICLR 2026 Oral] Official implementation of “TabStruct: Measuring...

37
Emerging
26 ComplexData-MILA/AIF-Gen

Generating Synthetic Lifelong RL Data for LLMs at Scale

37
Emerging
27 pedrodevog/SynthECG

The first systematic evaluation framework for synthetic 10-second 12-lead...

37
Emerging
28 jameszhou-gl/HiSGT

Code for ECAI'25-Generating Clinically Realistic EHR Data via a Hierarchy-...

36
Emerging
29 Lysarthas/Time-Transformer

[SDM24] Official code for "Time-Transformer"

36
Emerging
30 caetas/GenerativeZoo

Model Zoo for Generative Models.

36
Emerging
31 zjowowen/FuncGenFoil

Airfoil Generation and Editing Model in Function Space

35
Emerging
32 zealscott/SynMeter

A principled library for tuning, training and evaluating tabular data...

35
Emerging
33 grantzyr/MM-Health-Dataset

[EMNLP 2025 Findings] Official repo for paper: From Generation to Detection:...

34
Emerging
34 ViacheslavDanilov/generative_design

This repository is dedicated to the development of an approach based on...

34
Emerging
35 filipaldi/ai-font-generation-projects

AI Font Generation Benchmarks. Comparative analysis of AI font generation...

34
Emerging
36 Sreyan88/DALE

Code for EMNLP 2023 paper: DALE: Generative Data Augmentation for...

34
Emerging
37 Trustworthy-ML-Lab/posthoc-generative-cbm

[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform...

33
Emerging
38 markweberdev/maskbit

Implementation of the paper "MaskBit: Embedding-free Image Generation from...

32
Emerging
39 OpenProteinAI/openprotein-python

Simple python interface for the OpenProtein.AI REST API.

32
Emerging
40 DorinDaniil/Garage

Cutting-edge Python library designed for generative image augmentation!

32
Emerging
41 abideenml/AutoSynth

Automatically create synthetic data using SOTA techniques (Self Instruct,...

32
Emerging
42 michelecafagna26/vl-shap

[Frontiers in AI Journal] Implementation of the paper "Interpreting Vision...

30
Emerging
43 KonstantinosBarmpas/NeuroRVQ

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

30
Emerging
44 Diegomangasco/GenSUMO

Generative AI to create synthetic SUMO scenarios

29
Experimental
45 ML4ITS/synthetic-data

Generate synthetic time-series using generative adversarial networks....

29
Experimental
46 kayua/SyntheticOceanAI

SyntheticOcean: Open-Source Library for Generating Synthetic Tabular Data +...

26
Experimental
47 Mycheaux/DB-conv

Self-supervised generative AI enables conversion of two non-overlapping...

25
Experimental
48 MorningStarTM/Synthetic-Data-Generator

This Project for Creating unified tool to generate synthetic data (text and...

25
Experimental
49 Sreyan88/CoDa

Code for NAACL 2024 (Findings) Paper: CoDa: Constrained Generation based...

25
Experimental
50 Lee-CBG/TCRGen

Self-Contemplating In-Context Learning Enhances T Cell Receptor Generation...

25
Experimental
51 AmirhosseinHonardoust/Synthetic-Data-Artist

A professional, research-grade comparison of Gaussian Copula and Variational...

25
Experimental
52 rodrigobnogueira/faker-ai-provider

🤖 Faker provider for generating AI/ML fake data - models, companies,...

24
Experimental
53 Melckykaisha/synthetic-data-generation-demo

Interactive demonstration of synthetic data generation using GANs and VAEs...

24
Experimental
54 HowieHwong/DataGen

[ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models

24
Experimental
55 FishAres/RNP6

Code for Recursive Neural Programs: A differentiable framework for learning...

23
Experimental
56 vertaix/Alternators

This repository contains the implementation of **Alternators**, a novel...

23
Experimental
57 Sreyan88/ABEX

Code for ACL 2024 paper -- ABEX: Data Augmentation for Low-Resource NLU via...

22
Experimental
58 KonstantinosBarmpas/LaBraM-plus-plus

[NeurIPS 2025] Neural Information Processing Systems(2025) - Foundation...

22
Experimental
59 Sreyan88/ACLM

Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data...

22
Experimental
60 Sreyan88/Synthio

Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio...

21
Experimental
61 alexkoulakos/explain-then-predict

Source code for the BlackBoxNLP 2024 @ EMNLP paper "Enhancing adversarial...

21
Experimental
62 dario-coscia/barnn

BARNN: A Bayesian Autoregressive and Recurrent Neural Network - Official Repository

21
Experimental
63 NITHISHM2410/spatial-temporal-transformer

Spatial Temporal Transformer to capture Spatial and Temporal dynamics.

19
Experimental
64 yrodriguezmd/Synthetic_Medical_Tabular_Data

Generate synthetic medical data from a patient population dataset.

19
Experimental
65 cMancio00/ebm-molecules

This is my thesis for Computer Science master degree at University of Florence

19
Experimental
66 ImJaeSung/Synthesizers

Implementations of various synthesizers with pytorch.

19
Experimental
67 jameszhou-gl/Coogee

Coogee: An integrated pipeline for generating and auditing clinically...

19
Experimental
68 j9smith/generative-modelling

Notebook series exploring the theory and implementation of various generative models.

18
Experimental
69 DanteTrb/fall-risk-predictor

A fullstack AI-powered web application to assess fall risk in patients with...

18
Experimental
70 wilhelmagren/syndgen

SYNthetic Data GENeration made easy for everyone, free and open-sourced.

18
Experimental
71 shadowboxingskills/ppchainR

Your Probabilistic Modeling Copilot

17
Experimental
72 tacclab/bio_dataset_manager

This tool facilitates the encoding of these sequences into tensors, which...

17
Experimental
73 thalesbertaglia/instasynth

Synthetic Instagram Post Generation for Social Media Research

17
Experimental
74 kj14173/neuro-sequential-generative-core

A research-oriented implementation of sequential generative models for...

14
Experimental
75 marquito3012/TFM

Framework de IA Generativa para la creación de datos tabulares sintéticos en...

14
Experimental
76 Hunny-Mane/Polygen

PolyGen is a technical demonstration of high-concurrency data visualization...

14
Experimental
77 Fixer1983/synthetic-data-gen

Scalable synthetic data generation for training robust ML models.

14
Experimental
78 rizac/gmgt

Ground Motion Ground Truth is a collection of datasets of ground motion time...

14
Experimental
79 Chun-Bae/eeg-emotion-gen-compare

Comparing generative models for EEG emotion classification.

13
Experimental
80 silvano315/Gen-AI-for-Data-Augmentation

This is the ninth project of AI Engineering Master. It aims to use...

13
Experimental
81 rubsj/ai-synthetic-data-generator

Synthetic dataset generation pipeline with Pydantic validation and...

13
Experimental
82 Okja88/Visual-GenAI-Applications

A comprehensive portfolio of Visual Generative AI projects featuring...

13
Experimental
83 Sreyan88/BioAug

Code for SIGIR 2023 paper: BioAug: Conditional Generation based Data...

12
Experimental
84 francescotss/MLOpsDeepfakeDetection

Official repository for the paper "Continuous Fake Media Detection: Adapting...

11
Experimental
85 nxank4/an-augment

A Python library for advanced and novel data augmentation, combining...

11
Experimental
86 muxamilian/Robo99

Readability-optimized Font using Machine Learning

11
Experimental
87 DanteTrb/SpastiGait-xAI

Next-generation explainable AI integrating EMG and kinematics to identify...

11
Experimental
88 WilliamJlvt/llm_synthetic_data_generator

An API for generating synthetic datasets using a Large Language Model (LLM).

11
Experimental
89 DanteTrb/Prodromal_Parkinson

SynthAI-PD: Explainable Gait-Based Model for Prodromal Parkinsonism...

10
Experimental
90 eth-library/data-archive-ml-synthesizer

A modular machine learning pipeline that generates realistic synthetic METS...

10
Experimental