sdv-dev/SDV
Synthetic data generation for tabular data
This project helps data professionals create artificial datasets that statistically resemble their real-world tabular data, like customer records or transaction logs. You input your original sensitive data and it outputs a new, entirely fake dataset that maintains the essential patterns and relationships without exposing any private information. This is ideal for data scientists, analysts, and researchers who need to share or develop with data while adhering to privacy regulations.
3,439 stars. Used by 5 other packages. Actively maintained with 38 commits in the last 30 days. Available on PyPI.
Use this if you need to generate realistic, anonymized datasets from your existing sensitive tabular data for development, testing, or sharing without compromising privacy.
Not ideal if you require absolutely random, statistically unrelated data, or if your data is not in a structured tabular format.
Stars
3,439
Forks
417
Language
Python
License
—
Category
Last pushed
Mar 12, 2026
Commits (30d)
38
Dependencies
14
Reverse dependents
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/sdv-dev/SDV"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
sdv-dev/SDGym
Benchmarking synthetic data generation methods.
NVIDIA-NeMo/DataDesigner
🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch...
AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations,...
mostly-ai/mostlyai
Synthetic Data SDK ✨
hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured tabular data.