yandex-research/tab-ddpm

[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"

51
/ 100
Established

This project helps data scientists and machine learning engineers create realistic synthetic datasets from existing tabular data. You input your original structured data (like a spreadsheet or database table), and it outputs a new, artificially generated dataset that mirrors the statistical properties of your original data. This is useful for tasks like sharing data while protecting privacy, augmenting small datasets, or testing models without using sensitive real-world information.

533 stars. No commits in the last 6 months.

Use this if you need to generate high-quality synthetic versions of your structured, numerical, and categorical datasets while preserving their statistical characteristics and protecting privacy.

Not ideal if you are working with unstructured data like images, text, or audio, or if you need to generate entirely new data that doesn't resemble an existing dataset.

data-privacy synthetic-data-generation data-augmentation machine-learning-engineering tabular-data
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

533

Forks

132

Language

Python

License

MIT

Last pushed

Jul 13, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/yandex-research/tab-ddpm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.