cambridgeltl/mirror-bert

[EMNLP'21] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

/ 100

Emerging

This project helps machine learning engineers and NLP researchers transform existing language models into powerful text encoders for specific domains or tasks. You input a pre-trained language model and a text file containing raw text, and it quickly outputs a text embedding model capable of generating high-quality numerical representations for words, phrases, or sentences. These embeddings are crucial for tasks like semantic search, text classification, or recommendation systems.

No commits in the last 6 months.

Use this if you need to create highly accurate text embeddings for your unique dataset without the need for extensive labeled training data, especially when you have a large amount of raw text in your target domain.

Not ideal if you're looking for a simple, off-the-shelf solution for general English sentence similarity tasks without any custom domain adaptation.

natural-language-processing text-embeddings information-retrieval semantic-search custom-language-models

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

airaria/TextBrewer

A PyTorch-based knowledge distillation toolkit for natural language processing

sunyilgdx/NSP-BERT

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original...

princeton-nlp/CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408

kssteven418/LTP

[KDD'22] Learned Token Pruning for Transformers

georgian-io/Transformers-Domain-Adaptation

:no_entry: [DEPRECATED] Adapt Transformer-based language models to new text domains

Explore NLP Tools

All categories Trending NLP directory Insights