yueyu1030/Patron

[ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

21
/ 100
Experimental

This project helps machine learning engineers and data scientists efficiently train large language models for text classification tasks when labeled data is scarce. It takes unlabeled text data and a small set of labeled examples, then intelligently selects the most informative data points for labeling. The output is a highly curated, smaller dataset that yields better model performance than random selection for tasks like sentiment analysis or topic classification.

No commits in the last 6 months.

Use this if you need to fine-tune a language model for text classification but have very limited labeled data, and want to improve model accuracy by strategically selecting additional examples for labeling.

Not ideal if you already have a large, high-quality labeled dataset, or if your primary goal is not text classification.

text-classification natural-language-processing machine-learning-engineering data-labeling sentiment-analysis
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 7 / 25

How are scores calculated?

Stars

24

Forks

2

Language

Python

License

Last pushed

Jun 01, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/yueyu1030/Patron"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.