cambridgeltl/mirror-bert
[EMNLP'21] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.
This project helps machine learning engineers and NLP researchers transform existing language models into powerful text encoders for specific domains or tasks. You input a pre-trained language model and a text file containing raw text, and it quickly outputs a text embedding model capable of generating high-quality numerical representations for words, phrases, or sentences. These embeddings are crucial for tasks like semantic search, text classification, or recommendation systems.
No commits in the last 6 months.
Use this if you need to create highly accurate text embeddings for your unique dataset without the need for extensive labeled training data, especially when you have a large amount of raw text in your target domain.
Not ideal if you're looking for a simple, off-the-shelf solution for general English sentence similarity tasks without any custom domain adaptation.
Stars
77
Forks
8
Language
Python
License
MIT
Category
Last pushed
Aug 14, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/cambridgeltl/mirror-bert"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
airaria/TextBrewer
A PyTorch-based knowledge distillation toolkit for natural language processing
sunyilgdx/NSP-BERT
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original...
princeton-nlp/CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
kssteven418/LTP
[KDD'22] Learned Token Pruning for Transformers
georgian-io/Transformers-Domain-Adaptation
:no_entry: [DEPRECATED] Adapt Transformer-based language models to new text domains