mostly-ai/mostlyai

Synthetic Data SDK ✨

61
/ 100
Established

This tool helps data professionals create realistic, artificial datasets that look and behave like real data but don't contain any sensitive personal information. You feed in your original tabular or text data, and it produces a new dataset that preserves the statistical properties and patterns of the original. Data scientists, machine learning engineers, and data analysts can use this to develop and test models without privacy concerns.

750 stars. Available on PyPI.

Use this if you need to share, develop, or test data-driven applications and models using realistic data, but your original data contains sensitive information that cannot be exposed.

Not ideal if your primary need is to simply anonymize or mask existing sensitive data rather than generate entirely new synthetic data.

data-privacy machine-learning-development data-sharing analytics-testing regulatory-compliance
Maintenance 10 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 16 / 25

How are scores calculated?

Stars

750

Forks

63

Language

Python

License

Apache-2.0

Last pushed

Jan 13, 2026

Commits (30d)

0

Dependencies

20

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/mostly-ai/mostlyai"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.