JohnGiorgi/DeCLUTR

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!

/ 100

Emerging

This project helps machine learning engineers or data scientists create high-quality, general-purpose text embeddings without needing manually labeled data. You provide a large collection of unlabeled documents, and it processes them to output numerical representations (embeddings) that capture the meaning of your text. These embeddings can then be used for tasks like semantic search, clustering, or text similarity.

378 stars. No commits in the last 6 months.

Use this if you need powerful, unlabeled text embeddings for downstream natural language processing tasks and want to train your own models from scratch or fine-tune existing ones.

Not ideal if you already have labeled data for your specific task or if you only need pre-trained embeddings without custom training.

natural-language-processing unsupervised-learning text-analytics machine-learning-engineering text-representation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

378

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

mims-harvard/ClinVec

ClinVec: Unified Embeddings of Clinical Codes Enable Knowledge-Grounded AI in Medicine

NYUMedML/DeepEHR

Chronic Disease Prediction Using Medical Notes

mims-harvard/SHEPHERD

SHEPHERD: Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases

biocentral/biocentral_server

Compute functionality for biocentral.

nomic-ai/contrastors

Train Models Contrastively in Pytorch

Explore Embedding Tools

All categories Trending Embeddings directory Insights