Sreyan88/ACLM
Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER
This tool helps language model developers augment datasets for 'Complex Named Entity Recognition' (NER) tasks, especially in languages or domains where training data is scarce. It takes your existing small dataset of sentences with complex entities and generates diverse, contextually relevant new training examples. The end-user is a natural language processing (NLP) researcher or engineer working on NER tasks in low-resource settings.
No commits in the last 6 months.
Use this if you are building an NER system and struggle with accurately identifying complex entities in specialized domains or less common languages due to a lack of sufficient training data.
Not ideal if your NER task involves only simple, well-defined entities with abundant training data available.
Stars
22
Forks
2
Language
Python
License
—
Category
Last pushed
Jul 19, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/Sreyan88/ACLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
sdv-dev/SDV
Synthetic data generation for tabular data
sdv-dev/SDGym
Benchmarking synthetic data generation methods.
NVIDIA-NeMo/DataDesigner
🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch...
AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations,...
mostly-ai/mostlyai
Synthetic Data SDK ✨