radi-cho/datasetGPT

A command-line interface to generate textual and conversational datasets with LLMs.

39
/ 100
Emerging

This tool helps researchers, content creators, and AI trainers quickly generate large textual datasets using various Large Language Models. You provide a prompt or define conversational agents, along with desired parameters like length and temperature. The output is a structured JSON file or directory containing diverse texts or simulated conversations, ready for tasks like model training or analysis.

299 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to rapidly create diverse textual data or simulated conversations at scale for research, fine-tuning smaller AI models, or automating content generation tasks.

Not ideal if you need to transform or process existing text datasets, as this tool focuses solely on generating new content.

AI-training content-generation research-data-collection conversation-simulation LLM-fine-tuning
No License Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 17 / 25
Community 12 / 25

How are scores calculated?

Stars

299

Forks

19

Language

Python

License

Last pushed

Aug 25, 2023

Commits (30d)

0

Dependencies

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/radi-cho/datasetGPT"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.