The Synthetic Data Directory

Quality-scored directory of 13 synthetic data tools, updated daily. Every tool scored on maintenance, adoption, maturity, and community signals.

Synthetic data generation, augmentation, and simulation tools — creating training data when real data is scarce, private, or expensive to label.

Verified

1

70–100

Established

4

50–69

Emerging

3

30–49

Experimental

5

10–29

Top tools by quality score

# Tool Score
1 benkeen/generatedata

A powerful, feature-rich, random test data generator.

71
2 sdv-dev/CTGAN

Conditional GAN for generating synthetic tabular data.

67
3 databrickslabs/dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks...

63
4 DexForce/EmbodiChain

An end-to-end, GPU-accelerated, and modular platform for building...

63
5 synthesized-io/tdk-demo

This is a collection of TDK demo projects that use different databases and options

50
6 Stranger6667/hypothesis-graphql

Generate arbitrary queries matching your GraphQL schema, and use them to...

44
7 Buddhi19/SyntheticGen

[IGARSS 2026] Generates synthetic image–mask pairs by denoising joint...

43
8 Mukhopadhyay/pyfake

A Flexible and Extensible fake data generator based on Pydantic models.

31
9 lasgroup/ActiveUltraFeedback

Code for the paper "ActiveUltraFeedback: Efficient Preference Data...

29
10 jaehyeon-kim/dynamic-des

Real-time SimPy control plane to dynamically update parameters and stream...

27
11 aborruso/fauxdata

CLI to generate and validate realistic fake datasets from YAML schemas —...

27
12 PDBeurope/mmcif-gen

This application is designed to create mmcif files from facilities data.

25
13 doachyz/IIoT-simulator

An advanced Industrial IoT (IIoT) simulator for Smart Factory 4.0...

24

Browse by category