mostly-ai/mostlyai
Synthetic Data SDK ✨
This tool helps data professionals create realistic, artificial datasets that look and behave like real data but don't contain any sensitive personal information. You feed in your original tabular or text data, and it produces a new dataset that preserves the statistical properties and patterns of the original. Data scientists, machine learning engineers, and data analysts can use this to develop and test models without privacy concerns.
750 stars. Available on PyPI.
Use this if you need to share, develop, or test data-driven applications and models using realistic data, but your original data contains sensitive information that cannot be exposed.
Not ideal if your primary need is to simply anonymize or mask existing sensitive data rather than generate entirely new synthetic data.
Stars
750
Forks
63
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 13, 2026
Commits (30d)
0
Dependencies
20
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/mostly-ai/mostlyai"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
sdv-dev/SDV
Synthetic data generation for tabular data
sdv-dev/SDGym
Benchmarking synthetic data generation methods.
NVIDIA-NeMo/DataDesigner
🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch...
AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations,...
hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured tabular data.