StefanHeng/ProgGen

Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"

37
/ 100
Emerging

This project helps create high-quality, diverse datasets for training AI models to identify specific entities in text, like names, places, or product types. It takes instructions and example data, uses large language models to generate new text with these entities, and outputs comprehensive datasets ready for model training. This is for AI practitioners, machine learning engineers, or researchers who need specialized annotated text data but lack sufficient real-world examples.

No commits in the last 6 months.

Use this if you need to generate synthetic, diverse, and high-quality named entity recognition datasets to train your AI models, especially when real-world annotated data is scarce or expensive to acquire.

Not ideal if you already have ample, high-quality, real-world labeled data for your specific named entity recognition task.

named-entity-recognition dataset-generation NLP-data-synthesis AI-model-training-data text-annotation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

17

Forks

4

Language

Python

License

MIT

Last pushed

Mar 29, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/StefanHeng/ProgGen"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.