sebhaan/TabPFGen
TabPFGen: Synthetic Tabular Data Generation with TabPFN
This tool helps data professionals create realistic fake datasets when real data is scarce, sensitive, or imbalanced. You provide your original tabular data, and it generates new synthetic data that mirrors the statistical patterns of your input. Data scientists, analysts, and researchers can use this to expand their datasets, protect privacy, or balance class distributions for machine learning.
No commits in the last 6 months. Available on PyPI.
Use this if you need to generate additional tabular data for classification or regression tasks that closely mimics your original dataset, without sharing sensitive information or to address data scarcity.
Not ideal if you need perfectly exact class counts for balancing or have extremely large datasets where memory usage and computation time become a significant constraint.
Stars
35
Forks
5
Language
Python
License
Apache-2.0
Category
Last pushed
Jul 15, 2025
Commits (30d)
0
Dependencies
8
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/sebhaan/TabPFGen"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
sdv-dev/SDV
Synthetic data generation for tabular data
sdv-dev/SDGym
Benchmarking synthetic data generation methods.
NVIDIA-NeMo/DataDesigner
🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch...
AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations,...
mostly-ai/mostlyai
Synthetic Data SDK ✨