yueyu1030/ReGen
[ACL'23 Findings] This is the code repo for our ACL'23 Findings paper "ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval".
This tool helps categorize text documents like news articles, product reviews, or Wikipedia entries, even for categories you haven't explicitly trained on. You provide a collection of unlabeled text and a set of predefined categories, and it outputs classified documents. It's ideal for data analysts, content managers, or researchers who need to sort large volumes of text without extensive manual labeling.
No commits in the last 6 months.
Use this if you need to classify large amounts of text into categories but lack enough pre-labeled examples to train a traditional classifier from scratch.
Not ideal if you require a very high degree of precision for highly nuanced or safety-critical text classification, as zero-shot methods can sometimes introduce errors.
Stars
24
Forks
3
Language
Python
License
MIT
Category
Last pushed
Sep 08, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/yueyu1030/ReGen"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
n-waves/multifit
The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model...
princeton-nlp/SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
yxuansu/SimCTG
[NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation
alibaba-edu/simple-effective-text-matching
Source code of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".
Shark-NLP/OpenICL
OpenICL is an open-source framework to facilitate research, development, and prototyping of...