StefanHeng/ProgGen

Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"

/ 100

Emerging

This project helps create high-quality, diverse datasets for training AI models to identify specific entities in text, like names, places, or product types. It takes instructions and example data, uses large language models to generate new text with these entities, and outputs comprehensive datasets ready for model training. This is for AI practitioners, machine learning engineers, or researchers who need specialized annotated text data but lack sufficient real-world examples.

No commits in the last 6 months.

Use this if you need to generate synthetic, diverse, and high-quality named entity recognition datasets to train your AI models, especially when real-world annotated data is scarce or expensive to acquire.

Not ideal if you already have ample, high-quality, real-world labeled data for your specific named entity recognition task.

named-entity-recognition dataset-generation NLP-data-synthesis AI-model-training-data text-annotation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

williamliujl/CMExam

A Chinese National Medical Licensing Examination dataset and large languge model benchmarks

zjunlp/IEPile

[ACL 2024] IEPile: A Large-Scale Information Extraction Corpus

Yinghao-Li/GnO-IE

Code for "A Simple but Effective Approach to Improve Structured Language Model Output for...

MaheshJakkala/naamapadam-multilingual-ner

Benchmarking NER on Naamapadam across 7 Indic languages. EDA + model training for...

yaoyiran/BLI-Reading-List

A 2024 Reading List for Bilingual Lexicon Induction (BLI) / Word Translation. Frequently Updated.

Explore NLP Tools

All categories Trending NLP directory Insights