AmirhosseinHonardoust/Synthetic-Data-Artist

A professional, research-grade comparison of Gaussian Copula and Variational Autoencoder (VAE) methods for synthetic tabular data generation. Includes full evaluation pipeline with distribution overlap, correlation analysis, PCA projections, pairplots, metrics, and automated visual reports.

/ 100

Experimental

This project helps data professionals understand the trade-offs between different methods for creating artificial (synthetic) tabular data from existing sensitive datasets. You input your original dataset, and it outputs two new synthetic datasets along with a comprehensive report comparing how closely each synthetic dataset mirrors your original data's distributions and correlations. Data scientists, privacy officers, and machine learning engineers can use this to choose the best synthetic data approach for their needs.

Use this if you need to generate synthetic tabular data but are unsure whether a statistical (Copula) or a deep learning (VAE) approach is better suited for your specific privacy, fidelity, or diversity requirements.

Not ideal if your dataset has highly complex, nonlinear dependencies or if you require advanced privacy guarantees like differential privacy out-of-the-box.

data-privacy synthetic-data-generation data-sharing data-anonymization machine-learning-data

No Package No Dependents

Maintenance 6 / 25

Adoption 6 / 25

Maturity 13 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

sdv-dev/SDV

Synthetic data generation for tabular data

sdv-dev/SDGym

Benchmarking synthetic data generation methods.

NVIDIA-NeMo/DataDesigner

🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch...

AlexanderVNikitin/tsgm

Generation and evaluation of synthetic time series datasets (also, augmentations,...

mostly-ai/mostlyai

Synthetic Data SDK ✨

Explore Generative AI Tools

All categories Trending Generative AI directory Insights