SDV and SDGym

SDGym is a benchmarking framework that evaluates and compares synthetic data generation methods, making it a complement to SDV that enables practitioners to assess SDV's performance against alternative approaches.

SDV
81
Verified
SDGym
69
Established
Maintenance 20/25
Adoption 15/25
Maturity 25/25
Community 21/25
Maintenance 10/25
Adoption 11/25
Maturity 25/25
Community 23/25
Stars: 3,439
Forks: 417
Downloads:
Commits (30d): 38
Language: Python
License:
Stars: 301
Forks: 67
Downloads:
Commits (30d): 0
Language: Python
License:
No risk flags
No risk flags

About SDV

sdv-dev/SDV

Synthetic data generation for tabular data

This project helps data professionals create artificial datasets that statistically resemble their real-world tabular data, like customer records or transaction logs. You input your original sensitive data and it outputs a new, entirely fake dataset that maintains the essential patterns and relationships without exposing any private information. This is ideal for data scientists, analysts, and researchers who need to share or develop with data while adhering to privacy regulations.

data-anonymization privacy-compliance data-generation data-analysis machine-learning-development

About SDGym

sdv-dev/SDGym

Benchmarking synthetic data generation methods.

This tool helps data practitioners evaluate and compare different methods for creating synthetic datasets. You input various synthetic data generation models and your original datasets, and it outputs detailed reports on performance, memory usage, and the quality and privacy of the generated synthetic data. Data scientists and machine learning engineers who work with sensitive or limited real-world data would find this useful.

data-science machine-learning-engineering data-privacy data-anonymization synthetic-data-generation

Scores updated daily from GitHub, PyPI, and npm data. How scores work