NVIDIA-NeMo/DataDesigner
🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch or based on seed data.
This helps data scientists, researchers, and AI developers create high-quality synthetic datasets for training models or testing applications. You provide a blueprint or existing data, and it generates diverse, statistically sound data with controlled relationships between fields. The output is a dataset that mimics real-world data without privacy concerns or limitations.
795 stars. Actively maintained with 72 commits in the last 30 days.
Use this if you need to generate diverse, high-quality synthetic data for machine learning, data analysis, or testing, and require control over statistical distributions, field relationships, and data validation.
Not ideal if you only need very simple, unstructured data generation or if your primary need is basic data masking for privacy rather than comprehensive synthetic data creation.
Stars
795
Forks
66
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
72
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/NVIDIA-NeMo/DataDesigner"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
sdv-dev/SDV
Synthetic data generation for tabular data
sdv-dev/SDGym
Benchmarking synthetic data generation methods.
AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations,...
mostly-ai/mostlyai
Synthetic Data SDK ✨
hitsz-ids/synthetic-data-generator
SDG is a specialized framework designed to generate high-quality structured tabular data.