sdv-dev/SDGym

Benchmarking synthetic data generation methods.

69
/ 100
Established

This tool helps data practitioners evaluate and compare different methods for creating synthetic datasets. You input various synthetic data generation models and your original datasets, and it outputs detailed reports on performance, memory usage, and the quality and privacy of the generated synthetic data. Data scientists and machine learning engineers who work with sensitive or limited real-world data would find this useful.

301 stars. Used by 1 other package. Available on PyPI.

Use this if you need to reliably choose the best synthetic data generation technique for your specific data and use case by objectively benchmarking different models.

Not ideal if you are looking for a simple 'one-click' solution to generate synthetic data without needing to compare or customize underlying models.

data-science machine-learning-engineering data-privacy data-anonymization synthetic-data-generation
Maintenance 10 / 25
Adoption 11 / 25
Maturity 25 / 25
Community 23 / 25

How are scores calculated?

Stars

301

Forks

67

Language

Python

License

Last pushed

Mar 13, 2026

Commits (30d)

0

Dependencies

21

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/sdv-dev/SDGym"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.