pietrolesci/anchoral

This is the official PyTorch implementation for our NAACL 2024 paper: "AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets".

/ 100

Experimental

AnchorAL helps machine learning practitioners efficiently train classification models when dealing with very large datasets where some categories are rare. It takes your raw, unlabeled text data and helps you intelligently select the most informative examples to label. The outcome is a better performing model, especially for those hard-to-find minority classes, while significantly reducing the time and cost associated with manual data labeling.

No commits in the last 6 months.

Use this if you are building text classification models and struggle with large, imbalanced datasets, where manually labeling enough data to achieve good performance on rare categories is a major bottleneck.

Not ideal if your datasets are small, perfectly balanced, or if you are working on tasks other than classification, such as regression or generative AI.

text-classification data-labeling imbalanced-datasets machine-learning-operations natural-language-processing

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

ymcui/cmrc2018

A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)

princeton-nlp/DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval...

thunlp/MultiRD

Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"

IndexFziQ/KMRC-Papers

A list of recent papers regarding knowledge-based machine reading comprehension.

danqi/rc-cnn-dailymail

CNN/Daily Mail Reading Comprehension Task

Explore NLP Tools

All categories Trending NLP directory Insights