SAP-samples/acl2023-micse

Source code for ACL 2023 paper "miCSE: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings".

34
/ 100
Emerging

This project helps machine learning engineers and researchers create high-quality numerical representations (embeddings) of sentences, even when they have very little training data. It takes a collection of English sentences and outputs a fine-tuned language model capable of generating these robust sentence embeddings. This is particularly useful for those developing natural language processing applications with limited data resources.

No commits in the last 6 months.

Use this if you need to train a sentence embedding model efficiently with only a small amount of labeled data, aiming for state-of-the-art performance in low-shot learning scenarios.

Not ideal if you already have a large, diverse dataset for training your sentence embedding model, as its primary advantage is in data-scarce situations.

natural-language-processing low-resource-nlp sentence-similarity machine-learning-engineering model-training
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

9

Forks

2

Language

Python

License

Apache-2.0

Last pushed

Mar 07, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/SAP-samples/acl2023-micse"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.