StefanHeng/ProgGen
Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"
This project helps create high-quality, diverse datasets for training AI models to identify specific entities in text, like names, places, or product types. It takes instructions and example data, uses large language models to generate new text with these entities, and outputs comprehensive datasets ready for model training. This is for AI practitioners, machine learning engineers, or researchers who need specialized annotated text data but lack sufficient real-world examples.
No commits in the last 6 months.
Use this if you need to generate synthetic, diverse, and high-quality named entity recognition datasets to train your AI models, especially when real-world annotated data is scarce or expensive to acquire.
Not ideal if you already have ample, high-quality, real-world labeled data for your specific named entity recognition task.
Stars
17
Forks
4
Language
Python
License
MIT
Category
Last pushed
Mar 29, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/StefanHeng/ProgGen"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
williamliujl/CMExam
A Chinese National Medical Licensing Examination dataset and large languge model benchmarks
zjunlp/IEPile
[ACL 2024] IEPile: A Large-Scale Information Extraction Corpus
Yinghao-Li/GnO-IE
Code for "A Simple but Effective Approach to Improve Structured Language Model Output for...
MaheshJakkala/naamapadam-multilingual-ner
Benchmarking NER on Naamapadam across 7 Indic languages. EDA + model training for...
yaoyiran/BLI-Reading-List
A 2024 Reading List for Bilingual Lexicon Induction (BLI) / Word Translation. Frequently Updated.