bespokelabsai/curator

Synthetic data curation for post-training and structured data extraction

62
/ 100
Established

This tool helps machine learning engineers and data scientists efficiently create high-quality synthetic datasets. It takes raw, unstructured information and generates structured data suitable for training or fine-tuning AI models. You can also use it to extract specific structured information from large volumes of text.

1,643 stars. Actively maintained with 9 commits in the last 30 days.

Use this if you need to quickly generate diverse, structured synthetic data to train or enhance your large language models, or if you need to extract specific details from unstructured text at scale.

Not ideal if you're looking for a simple data labeling solution or don't work with AI models requiring synthetic data for training and evaluation.

AI model training data science structured data extraction machine learning large language models
No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

1,643

Forks

136

Language

Python

License

Apache-2.0

Last pushed

Jan 24, 2026

Commits (30d)

9

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/bespokelabsai/curator"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.