pietrolesci/anchoral
This is the official PyTorch implementation for our NAACL 2024 paper: "AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets".
AnchorAL helps machine learning practitioners efficiently train classification models when dealing with very large datasets where some categories are rare. It takes your raw, unlabeled text data and helps you intelligently select the most informative examples to label. The outcome is a better performing model, especially for those hard-to-find minority classes, while significantly reducing the time and cost associated with manual data labeling.
No commits in the last 6 months.
Use this if you are building text classification models and struggle with large, imbalanced datasets, where manually labeling enough data to achieve good performance on rare categories is a major bottleneck.
Not ideal if your datasets are small, perfectly balanced, or if you are working on tasks other than classification, such as regression or generative AI.
Stars
22
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Apr 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/pietrolesci/anchoral"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ymcui/cmrc2018
A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)
princeton-nlp/DensePhrases
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval...
thunlp/MultiRD
Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"
IndexFziQ/KMRC-Papers
A list of recent papers regarding knowledge-based machine reading comprehension.
danqi/rc-cnn-dailymail
CNN/Daily Mail Reading Comprehension Task