quentinlintz/synthetic-data-generator
🦄 Use GPT to generate and label data
This tool helps data scientists and NLP practitioners quickly create labeled text datasets for training language models. You provide a prompt describing the kind of text you need, and it generates synthetic comments along with a binary label (e.g., 'suggestion' or 'not a suggestion'). The output is a CSV file ready for use in machine learning workflows.
No commits in the last 6 months.
Use this if you need to rapidly generate custom, labeled text data for a specific NLP task without having to collect or annotate real-world examples.
Not ideal if your task requires highly nuanced or domain-specific labeling that a general-purpose language model might struggle to accurately infer.
Stars
25
Forks
3
Language
Python
License
MIT
Category
Last pushed
Apr 30, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/quentinlintz/synthetic-data-generator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
graykode/gpt-2-Pytorch
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation
imcaspar/gpt2-ml
GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型
Morizeyao/GPT2-Chinese
Chinese version of GPT2 training code, using BERT tokenizer.
gyunggyung/KoGPT2-FineTuning
🔥 Korean GPT-2, KoGPT2 FineTuning cased. 한국어 가사 데이터 학습 🔥
liucongg/GPT2-NewsTitle
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。