vanderschaarlab/synthcity

A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.

57
/ 100
Established

This tool helps data professionals create artificial datasets that look and behave like real-world data but don't contain any sensitive information. You input your original tabular data, and it outputs a new, synthetic dataset that can be shared or used for development without privacy concerns. This is ideal for data scientists, analysts, and researchers working with confidential information.

643 stars.

Use this if you need to generate high-quality synthetic versions of sensitive tabular, time-series, survival analysis, or image data for privacy-preserving analysis, sharing, or model development.

Not ideal if your original data contains missing values, as this tool requires data to be fully imputed beforehand.

data-privacy data-sharing machine-learning-development dataset-augmentation statistical-modeling
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

643

Forks

90

Language

Python

License

Apache-2.0

Last pushed

Feb 11, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/vanderschaarlab/synthcity"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.