rasinmuhammed/misata
High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized NumPy for deterministic, scalable generation.
Need to create realistic, interconnected datasets for testing, demos, or simulations? Misata helps you generate complex, multi-table synthetic data by simply describing your business scenario in plain English. It takes your "story" – like "An ecommerce company with seasonal demand" – and outputs structured data tables with consistent relationships, accurate aggregations, and real-world logic. This is ideal for data engineers, QA testers, data scientists, and business analysts who need quality data without using sensitive production information.
Used by 1 other package. Available on PyPI.
Use this if you need to quickly generate realistic, relational test data, demo datasets for dashboards, or simulation scenarios for various business operations, ensuring data integrity across multiple tables.
Not ideal if you're looking for a simple tool to generate disconnected rows of random fake data without any logical relationships or aggregate targets.
Stars
52
Forks
3
Language
Python
License
MIT
Category
Last pushed
Mar 08, 2026
Commits (30d)
0
Dependencies
13
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/rasinmuhammed/misata"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
InternScience/GraphGen
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
timothepearce/synda
A CLI for generating synthetic data
ziegler-ingo/CRAFT
[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...
ZhuLinsen/FastDatasets
A powerful tool for creating high-quality training datasets for Large Language Models...
BatsResearch/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.