Oqura-ai/deepresearch-datagen-cli
Using deep research workflow to generate datasets for finetuning LLMs.
This tool helps researchers and analysts quickly generate structured datasets for various applications. You provide a description of the dataset you need, and it searches the web, builds context, suggests a data structure, and outputs clean, usable data. It's designed for anyone needing to create custom datasets without the manual work of gathering and formatting information.
No commits in the last 6 months.
Use this if you need to quickly create a structured dataset on a specific topic by leveraging web research, without manual data collection.
Not ideal if you require datasets from internal or proprietary databases that are not accessible via public web searches.
Stars
39
Forks
6
Language
Python
License
MIT
Category
Last pushed
Oct 09, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Oqura-ai/deepresearch-datagen-cli"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
InternScience/GraphGen
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
timothepearce/synda
A CLI for generating synthetic data
rasinmuhammed/misata
High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized...
ziegler-ingo/CRAFT
[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...
ZhuLinsen/FastDatasets
A powerful tool for creating high-quality training datasets for Large Language Models...