AmirhosseinHonardoust/Autocurator-Synthetic-Data-Benchmark

Autocurator is a comprehensive benchmarking toolkit for evaluating synthetic tabular data. It measures fidelity, coverage, privacy, and utility through quantitative metrics, visual reports, and PCA/correlation diagnostics. Ideal for validating VAE, GAN, Copula, or Diffusion-generated datasets.

30
/ 100
Emerging

This tool helps data professionals confidently assess the quality and safety of computer-generated 'synthetic' tabular data. You provide your original customer data and the synthetic version, and it produces a detailed report with metrics and visualizations. This is for data scientists, machine learning engineers, and data privacy officers who need to validate synthetic datasets before use or sharing.

Use this if you need to objectively measure how closely synthetic data matches real data, ensuring it preserves patterns, maintains privacy, and supports predictive models.

Not ideal if you are looking for a tool to generate synthetic data itself, as this project focuses solely on evaluating existing synthetic datasets.

synthetic-data-validation data-privacy-compliance machine-learning-auditing tabular-data-analysis dataset-quality-assurance
No Package No Dependents
Maintenance 6 / 25
Adoption 6 / 25
Maturity 13 / 25
Community 5 / 25

How are scores calculated?

Stars

20

Forks

1

Language

Python

License

MIT

Last pushed

Nov 11, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/AmirhosseinHonardoust/Autocurator-Synthetic-Data-Benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.