NVIDIA-NeMo/DataDesigner

🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch or based on seed data.

62
/ 100
Established

This helps data scientists, researchers, and AI developers create high-quality synthetic datasets for training models or testing applications. You provide a blueprint or existing data, and it generates diverse, statistically sound data with controlled relationships between fields. The output is a dataset that mimics real-world data without privacy concerns or limitations.

795 stars. Actively maintained with 72 commits in the last 30 days.

Use this if you need to generate diverse, high-quality synthetic data for machine learning, data analysis, or testing, and require control over statistical distributions, field relationships, and data validation.

Not ideal if you only need very simple, unstructured data generation or if your primary need is basic data masking for privacy rather than comprehensive synthetic data creation.

data-generation machine-learning-engineering dataset-creation AI-training data-testing
No Package No Dependents
Maintenance 22 / 25
Adoption 10 / 25
Maturity 13 / 25
Community 17 / 25

How are scores calculated?

Stars

795

Forks

66

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

72

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/NVIDIA-NeMo/DataDesigner"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.