sdv-dev/SDV

Synthetic data generation for tabular data

/ 100

Verified

This project helps data professionals create artificial datasets that statistically resemble their real-world tabular data, like customer records or transaction logs. You input your original sensitive data and it outputs a new, entirely fake dataset that maintains the essential patterns and relationships without exposing any private information. This is ideal for data scientists, analysts, and researchers who need to share or develop with data while adhering to privacy regulations.

3,439 stars. Used by 5 other packages. Actively maintained with 38 commits in the last 30 days. Available on PyPI.

Use this if you need to generate realistic, anonymized datasets from your existing sensitive tabular data for development, testing, or sharing without compromising privacy.

Not ideal if you require absolutely random, statistically unrelated data, or if your data is not in a structured tabular format.

data-anonymization privacy-compliance data-generation data-analysis machine-learning-development

Maintenance 20 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 21 / 25

How are scores calculated?

Stars

3,439

Forks

417

Language

Python

License

—

Compare

SDV and SDGym SDV and synthetic-data-generator

Related tools

sdv-dev/SDGym

Benchmarking synthetic data generation methods.

NVIDIA-NeMo/DataDesigner

🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch...

AlexanderVNikitin/tsgm

Generation and evaluation of synthetic time series datasets (also, augmentations,...

mostly-ai/mostlyai

Synthetic Data SDK ✨

hitsz-ids/synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.

Explore Generative AI Tools

All categories Trending Generative AI directory Insights