AmirhosseinHonardoust/Autocurator-Synthetic-Data-Benchmark
Autocurator is a comprehensive benchmarking toolkit for evaluating synthetic tabular data. It measures fidelity, coverage, privacy, and utility through quantitative metrics, visual reports, and PCA/correlation diagnostics. Ideal for validating VAE, GAN, Copula, or Diffusion-generated datasets.
This tool helps data professionals confidently assess the quality and safety of computer-generated 'synthetic' tabular data. You provide your original customer data and the synthetic version, and it produces a detailed report with metrics and visualizations. This is for data scientists, machine learning engineers, and data privacy officers who need to validate synthetic datasets before use or sharing.
Use this if you need to objectively measure how closely synthetic data matches real data, ensuring it preserves patterns, maintains privacy, and supports predictive models.
Not ideal if you are looking for a tool to generate synthetic data itself, as this project focuses solely on evaluating existing synthetic datasets.
Stars
20
Forks
1
Language
Python
License
MIT
Category
Last pushed
Nov 11, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/AmirhosseinHonardoust/Autocurator-Synthetic-Data-Benchmark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Diyago/Tabular-data-generation
We well know GANs for success in the realistic image generation. However, they can be applied in...
meta-llama/synthetic-data-kit
Tool for generating high quality Synthetic datasets
Data-Centric-AI-Community/ydata-synthetic
Synthetic data generators for tabular and time-series data
tdspora/syngen
Open-source version of the TDspora synthetic data generation algorithm.
vanderschaarlab/synthcity
A library for generating and evaluating synthetic tabular data for privacy, fairness and data...