roeeaharoni/unsupervised-domain-clusters
Code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".
This project provides tools for natural language processing researchers to understand how different subject areas are represented within large language models. It takes parallel text from various domains, like medical or legal documents, and reveals underlying groups or 'clusters' of these domains. Researchers working with multilingual text or pre-trained language models can use this to analyze domain relationships.
No commits in the last 6 months.
Use this if you are a researcher studying how language models handle information from diverse real-world topics and want to identify inherent groupings of these topics.
Not ideal if you are looking for a ready-to-use translation tool or a way to train new language models from scratch.
Stars
58
Forks
4
Language
Jupyter Notebook
License
—
Category
Last pushed
Aug 22, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/roeeaharoni/unsupervised-domain-clusters"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
airaria/TextBrewer
A PyTorch-based knowledge distillation toolkit for natural language processing
sunyilgdx/NSP-BERT
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original...
princeton-nlp/CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
kssteven418/LTP
[KDD'22] Learned Token Pruning for Transformers
georgian-io/Transformers-Domain-Adaptation
:no_entry: [DEPRECATED] Adapt Transformer-based language models to new text domains