cambridgeltl/mirror-bert

[EMNLP'21] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

37
/ 100
Emerging

This project helps machine learning engineers and NLP researchers transform existing language models into powerful text encoders for specific domains or tasks. You input a pre-trained language model and a text file containing raw text, and it quickly outputs a text embedding model capable of generating high-quality numerical representations for words, phrases, or sentences. These embeddings are crucial for tasks like semantic search, text classification, or recommendation systems.

No commits in the last 6 months.

Use this if you need to create highly accurate text embeddings for your unique dataset without the need for extensive labeled training data, especially when you have a large amount of raw text in your target domain.

Not ideal if you're looking for a simple, off-the-shelf solution for general English sentence similarity tasks without any custom domain adaptation.

natural-language-processing text-embeddings information-retrieval semantic-search custom-language-models
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

77

Forks

8

Language

Python

License

MIT

Last pushed

Aug 14, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/cambridgeltl/mirror-bert"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.