SAP-samples/acl2023-micse

Source code for ACL 2023 paper "miCSE: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings".

/ 100

Emerging

This project helps machine learning engineers and researchers create high-quality numerical representations (embeddings) of sentences, even when they have very little training data. It takes a collection of English sentences and outputs a fine-tuned language model capable of generating these robust sentence embeddings. This is particularly useful for those developing natural language processing applications with limited data resources.

No commits in the last 6 months.

Use this if you need to train a sentence embedding model efficiently with only a small amount of labeled data, aiming for state-of-the-art performance in low-shot learning scenarios.

Not ideal if you already have a large, diverse dataset for training your sentence embedding model, as its primary advantage is in data-scarce situations.

natural-language-processing low-resource-nlp sentence-similarity machine-learning-engineering model-training

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

princeton-nlp/SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821

n-waves/multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model...

yxuansu/SimCTG

[NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation

alibaba-edu/simple-effective-text-matching

Source code of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

Shark-NLP/OpenICL

OpenICL is an open-source framework to facilitate research, development, and prototyping of...

Explore NLP Tools

All categories Trending NLP directory Insights