roeeaharoni/unsupervised-domain-clusters

Code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".

24
/ 100
Experimental

This project provides tools for natural language processing researchers to understand how different subject areas are represented within large language models. It takes parallel text from various domains, like medical or legal documents, and reveals underlying groups or 'clusters' of these domains. Researchers working with multilingual text or pre-trained language models can use this to analyze domain relationships.

No commits in the last 6 months.

Use this if you are a researcher studying how language models handle information from diverse real-world topics and want to identify inherent groupings of these topics.

Not ideal if you are looking for a ready-to-use translation tool or a way to train new language models from scratch.

natural-language-processing computational-linguistics multilingual-data domain-adaptation text-analysis
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 8 / 25

How are scores calculated?

Stars

58

Forks

4

Language

Jupyter Notebook

License

Last pushed

Aug 22, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/roeeaharoni/unsupervised-domain-clusters"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.