ydata-synthetic and Synthetic-data-gen

These are competitors offering overlapping functionality—both provide synthetic data generation for tabular datasets—though ydata-synthetic has broader scope with time-series support and significantly more community adoption.

ydata-synthetic
59
Established
Synthetic-data-gen
46
Emerging
Maintenance 10/25
Adoption 10/25
Maturity 16/25
Community 23/25
Maintenance 0/25
Adoption 9/25
Maturity 16/25
Community 21/25
Stars: 1,614
Forks: 257
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License: MIT
Stars: 83
Forks: 42
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License: MIT
No Package No Dependents
Stale 6m No Package No Dependents

About ydata-synthetic

Data-Centric-AI-Community/ydata-synthetic

Synthetic data generators for tabular and time-series data

This project helps data professionals, researchers, and analysts create artificial datasets that statistically mimic real-world tabular or time-series information. You provide an existing dataset (like customer demographics or stock prices), and it generates a new, synthetic dataset of the same type and structure. This is ideal for anyone who needs to work with data that has privacy concerns or is too small to be effective.

data-privacy dataset-augmentation data-sharing machine-learning-development data-balancing

About Synthetic-data-gen

tirthajyoti/Synthetic-data-gen

Various methods for generating synthetic data for data science and ML

This project helps data scientists and machine learning practitioners generate diverse datasets for training and testing algorithms. It takes your specifications for data characteristics—like the number of samples, features, statistical distributions, and desired complexity—and outputs synthetic datasets tailored for classification, regression, clustering, or time series problems. This is ideal for those learning new algorithms or needing to explore algorithm behavior under specific, controlled data conditions.

machine-learning-education algorithm-testing data-simulation model-training statistical-modeling

Scores updated daily from GitHub, PyPI, and npm data. How scores work