AmirhosseinHonardoust/Autocurator-Synthetic-Data-Benchmark

Autocurator is a comprehensive benchmarking toolkit for evaluating synthetic tabular data. It measures fidelity, coverage, privacy, and utility through quantitative metrics, visual reports, and PCA/correlation diagnostics. Ideal for validating VAE, GAN, Copula, or Diffusion-generated datasets.

/ 100

Emerging

This tool helps data professionals confidently assess the quality and safety of computer-generated 'synthetic' tabular data. You provide your original customer data and the synthetic version, and it produces a detailed report with metrics and visualizations. This is for data scientists, machine learning engineers, and data privacy officers who need to validate synthetic datasets before use or sharing.

Use this if you need to objectively measure how closely synthetic data matches real data, ensuring it preserves patterns, maintains privacy, and supports predictive models.

Not ideal if you are looking for a tool to generate synthetic data itself, as this project focuses solely on evaluating existing synthetic datasets.

synthetic-data-validation data-privacy-compliance machine-learning-auditing tabular-data-analysis dataset-quality-assurance

No Package No Dependents

Maintenance 6 / 25

Adoption 6 / 25

Maturity 13 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

Diyago/Tabular-data-generation

We well know GANs for success in the realistic image generation. However, they can be applied in...

meta-llama/synthetic-data-kit

Tool for generating high quality Synthetic datasets

Data-Centric-AI-Community/ydata-synthetic

Synthetic data generators for tabular and time-series data

tdspora/syngen

Open-source version of the TDspora synthetic data generation algorithm.

vanderschaarlab/synthcity

A library for generating and evaluating synthetic tabular data for privacy, fairness and data...

Explore ML Frameworks

All categories Trending ML Framework directory Insights