pietrolesci/anchoral

This is the official PyTorch implementation for our NAACL 2024 paper: "AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets".

28
/ 100
Experimental

AnchorAL helps machine learning practitioners efficiently train classification models when dealing with very large datasets where some categories are rare. It takes your raw, unlabeled text data and helps you intelligently select the most informative examples to label. The outcome is a better performing model, especially for those hard-to-find minority classes, while significantly reducing the time and cost associated with manual data labeling.

No commits in the last 6 months.

Use this if you are building text classification models and struggle with large, imbalanced datasets, where manually labeling enough data to achieve good performance on rare categories is a major bottleneck.

Not ideal if your datasets are small, perfectly balanced, or if you are working on tasks other than classification, such as regression or generative AI.

text-classification data-labeling imbalanced-datasets machine-learning-operations natural-language-processing
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 4 / 25

How are scores calculated?

Stars

22

Forks

1

Language

Python

License

Apache-2.0

Last pushed

Apr 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/pietrolesci/anchoral"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.