AmirhosseinHonardoust/Synthetic-Data-Artist

A professional, research-grade comparison of Gaussian Copula and Variational Autoencoder (VAE) methods for synthetic tabular data generation. Includes full evaluation pipeline with distribution overlap, correlation analysis, PCA projections, pairplots, metrics, and automated visual reports.

25
/ 100
Experimental

This project helps data professionals understand the trade-offs between different methods for creating artificial (synthetic) tabular data from existing sensitive datasets. You input your original dataset, and it outputs two new synthetic datasets along with a comprehensive report comparing how closely each synthetic dataset mirrors your original data's distributions and correlations. Data scientists, privacy officers, and machine learning engineers can use this to choose the best synthetic data approach for their needs.

Use this if you need to generate synthetic tabular data but are unsure whether a statistical (Copula) or a deep learning (VAE) approach is better suited for your specific privacy, fidelity, or diversity requirements.

Not ideal if your dataset has highly complex, nonlinear dependencies or if you require advanced privacy guarantees like differential privacy out-of-the-box.

data-privacy synthetic-data-generation data-sharing data-anonymization machine-learning-data
No Package No Dependents
Maintenance 6 / 25
Adoption 6 / 25
Maturity 13 / 25
Community 0 / 25

How are scores calculated?

Stars

23

Forks

Language

Python

License

MIT

Last pushed

Nov 11, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/AmirhosseinHonardoust/Synthetic-Data-Artist"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.