Embedding Model Tuning Embedding Tools

Tools, techniques, and frameworks for fine-tuning embedding models on domain-specific data to improve performance on downstream tasks. Does NOT include pre-trained embedding models, embedding inference/serving, or applications built on top of embeddings.

There are 48 embedding model tuning tools tracked. 1 score above 50 (established tier). The highest-rated is ContextualAI/gritlm at 54/100 with 688 stars.

Get all 48 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=embedding-model-tuning&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 ContextualAI/gritlm

Generative Representational Instruction Tuning

54
Established
2 xlang-ai/instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

45
Emerging
3 liuqidong07/LLMEmb

[AAAI'25 Oral] The official implementation code of LLMEmb

42
Emerging
4 hpcaitech/CachedEmbedding

A memory efficient DLRM training solution using ColossalAI

40
Emerging
5 ritesh-modi/embedding-hallucinations

This repo shows how foundational model hallucinates and how we can fix such...

39
Emerging
6 ritesh-modi/fine-tuning-embeddings-template

This repo is a template to fine-tune embedding models using...

37
Emerging
7 lperezmo/embeddings-extraction

Scripts for reading, extracting, and organizing data from either HTML or PDF...

36
Emerging
8 jjcmoon/DeepSoftLog

Soft-Unification in Deep Probabilistic Logic (NeurIPS 2023)

35
Emerging
9 shobrook/weightgain

Train an adapter for any embedding model in under a minute

35
Emerging
10 jina-ai/llm-query-expansion

Query Expension for Better Query Embedding using LLMs

35
Emerging
11 Benja1972/topicphrase

Simple project for extraction of key-phrases from single document based on...

29
Experimental
12 CodeSoul-co/THETA

LLM-adaptive embeddings (Zero-shot / LoRA) with Generative Topic Modeling &...

28
Experimental
13 aws-samples/finetune-bge-embeddings-blog

Code associated with the blog post titled, "Fine-Tuning BGE Embeddings Using...

28
Experimental
14 LivingFutureLab/UQABench

[KDD 2025] The source code for UQABench

26
Experimental
15 Blue16-WangFudi/DialectSense

Chinese dialect identification using audio embeddings from LLMs.

25
Experimental
16 shimo-lab/modelmap

Embedding language models in probability space via log-likelihood vectors

24
Experimental
17 csinva/fmri

Experiments with language fMRI data from Alex Huth lab. More organized repo...

23
Experimental
18 zh-he/Document-Based-Fine-Tuning-Tool

One-stop pipeline for building IR datasets from PDFs and fine-tuning...

23
Experimental
19 aws-samples/fine-tune-embedding-models-on-sagemaker

This repository contains samples for fine-tuning embedding models using...

22
Experimental
20 csinva/interpretable-embeddings

Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)

21
Experimental
21 AnderssonProgramming/llm-embeddings-text-preprocessing

LLM text preprocessing and embedding pipeline implementation for the...

21
Experimental
22 ksm26/Embedding-Models-From-Architecture-to-Implementation

Understand and build embedding models, focusing on word and sentence...

21
Experimental
23 vidhiJain/SpatialEmbeddings

Learning Embeddings that Capture Spatial Semantics for Indoor Navigation,...

21
Experimental
24 FelipeBenavidesMz/AlphaEarth-Interpretability-Experiments

Binary classification experiments to interpret Google AlphaEarth Foundation...

21
Experimental
25 Jiayu7Yao/llm-classifier

Classify, cluster, and extract data using structured LLM outputs with...

21
Experimental
26 rag-fish/noesisnoema-pipeline

Modular pipeline for building RAG and LLM workflows in Colab, including...

20
Experimental
27 PetropoulakisPanagiotis/igae

State Representations as Incentives for Reinforcement Learning Agents: A...

19
Experimental
28 NC0DER/LMRank

LMRank: Utilizing Pre-Trained Language Models and Dependency Parsing for...

19
Experimental
29 sine2pi/ASR-model

ASR model

18
Experimental
30 meghanmane84/LLM-Manifold-Based-Compression-Techniques

Research code for LLM Compression using Functional Algorithms, exploring...

15
Experimental
31 rubsj/ai-contrastive-embedding-finetuning

Domain-specific embedding fine-tuning with contrastive learning and PEFT/LoRA

13
Experimental
32 IMSUVEN/wubba

Wubba learns layout-invariant embeddings from raw HTML using contrastive...

13
Experimental
33 quantumxiaol/activation_beacon

fork from...

13
Experimental
34 LCEmT/LCEmT

Lossless Compression Techniques for Embedding Tables in Substantial Deep...

13
Experimental
35 AparnaRoy76/Fine-Tune-Embedding-Model

🚀 Generate high-quality triplet datasets for job titles & skills, and...

13
Experimental
36 1kkiRen/Embeddings-Division

Python script for dividing embedding layer of LLM.

13
Experimental
37 Renatoelho/embeddings-consultas-similaridade

Vou mostrar como converter textos simples em representações matemáticas...

13
Experimental
38 daniau23/Fine_Tuning_LLMs_and_Embeddings

Exploring the fine tuning of both LLMs and Embedding models.

13
Experimental
39 StepanTita/space-model

Space Model framework that allows for maintaining generalizability, and...

13
Experimental
40 kushagraghosh/EuroSAT

Trained a ResNet50 model on the EuroSAT satellite imagery dataset w/...

13
Experimental
41 YoRzHe-HotaaRu/Learn-EmbedAIModel

a quick way to learn and understand what AI Embedding Model are about.

12
Experimental
42 Madhur-Chotia/LLMs-Mastery

this repo contains LLM and NLP applications starting from how tokenisers are...

12
Experimental
43 cestella/kaffeeklatsch

Higher Level Primitives for working with LLMs in Java

11
Experimental
44 mmanela/llm-embeddings

Clustering and labeling concepts using LLM Embeddings

11
Experimental
45 uci-cv-genelab-bps-mouse-template/mouse-bps-labeler

Use Active Learning to diversely sample the dataset and generate new labels...

11
Experimental
46 yhbcode000/soft-rob-embedding

Unifying the representation of robot statuses and actions with natural...

11
Experimental
47 mattelim/interprexis-mit-6.8610-nlp

InterpreXis: Finding Human-Interpretable Concepts Inside Contextual Word...

10
Experimental
48 sn2727/finetuning-embedding-models

Domain adaption for an embedding model using unsupervised and supervised...

10
Experimental