Embedding Model Tuning Embedding Tools

Tools, techniques, and frameworks for fine-tuning embedding models on domain-specific data to improve performance on downstream tasks. Does NOT include pre-trained embedding models, embedding inference/serving, or applications built on top of embeddings.

There are 48 embedding model tuning tools tracked. 1 score above 50 (established tier). The highest-rated is ContextualAI/gritlm at 54/100 with 688 stars.

Get all 48 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=embedding-model-tuning&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	ContextualAI/gritlm Generative Representational Instruction Tuning	54	Established	688	Jupyter Notebook
2	xlang-ai/instructor-embedding [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings	45	Emerging	2,023	Python
3	liuqidong07/LLMEmb [AAAI'25 Oral] The official implementation code of LLMEmb	42	Emerging	52	Python
4	hpcaitech/CachedEmbedding A memory efficient DLRM training solution using ColossalAI	40	Emerging	107	Python
5	ritesh-modi/embedding-hallucinations This repo shows how foundational model hallucinates and how we can fix such...	39	Emerging	9	Python
6	ritesh-modi/fine-tuning-embeddings-template This repo is a template to fine-tune embedding models using...	37	Emerging	7	Python
7	lperezmo/embeddings-extraction Scripts for reading, extracting, and organizing data from either HTML or PDF...	36	Emerging	13	Python
8	jjcmoon/DeepSoftLog Soft-Unification in Deep Probabilistic Logic (NeurIPS 2023)	35	Emerging	10	Python
9	shobrook/weightgain Train an adapter for any embedding model in under a minute	35	Emerging	129	Python
10	jina-ai/llm-query-expansion Query Expension for Better Query Embedding using LLMs	35	Emerging	68	Python
11	Benja1972/topicphrase Simple project for extraction of key-phrases from single document based on...	29	Experimental	7	Python
12	CodeSoul-co/THETA LLM-adaptive embeddings (Zero-shot / LoRA) with Generative Topic Modeling &...	28	Experimental	11	Python
13	aws-samples/finetune-bge-embeddings-blog Code associated with the blog post titled, "Fine-Tuning BGE Embeddings Using...	28	Experimental	11	Jupyter Notebook
14	LivingFutureLab/UQABench [KDD 2025] The source code for UQABench	26	Experimental	13	Python
15	Blue16-WangFudi/DialectSense Chinese dialect identification using audio embeddings from LLMs.	25	Experimental	2	Python
16	shimo-lab/modelmap Embedding language models in probability space via log-likelihood vectors	24	Experimental	16	Jupyter Notebook
17	csinva/fmri Experiments with language fMRI data from Alex Huth lab. More organized repo...	23	Experimental	4	Jupyter Notebook
18	zh-he/Document-Based-Fine-Tuning-Tool One-stop pipeline for building IR datasets from PDFs and fine-tuning...	23	Experimental	2	Python
19	aws-samples/fine-tune-embedding-models-on-sagemaker This repository contains samples for fine-tuning embedding models using...	22	Experimental	15	Jupyter Notebook
20	csinva/interpretable-embeddings Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)	21	Experimental	46	Python
21	AnderssonProgramming/llm-embeddings-text-preprocessing LLM text preprocessing and embedding pipeline implementation for the...	21	Experimental	—	Jupyter Notebook
22	ksm26/Embedding-Models-From-Architecture-to-Implementation Understand and build embedding models, focusing on word and sentence...	21	Experimental	7	Jupyter Notebook
23	vidhiJain/SpatialEmbeddings Learning Embeddings that Capture Spatial Semantics for Indoor Navigation,...	21	Experimental	9	Python
24	FelipeBenavidesMz/AlphaEarth-Interpretability-Experiments Binary classification experiments to interpret Google AlphaEarth Foundation...	21	Experimental	—	Jupyter Notebook
25	Jiayu7Yao/llm-classifier Classify, cluster, and extract data using structured LLM outputs with...	21	Experimental	—	Python
26	rag-fish/noesisnoema-pipeline Modular pipeline for building RAG and LLM workflows in Colab, including...	20	Experimental	3	Python
27	PetropoulakisPanagiotis/igae State Representations as Incentives for Reinforcement Learning Agents: A...	19	Experimental	4	Python
28	NC0DER/LMRank LMRank: Utilizing Pre-Trained Language Models and Dependency Parsing for...	19	Experimental	4	Python
29	sine2pi/ASR-model ASR model	18	Experimental	1	Python
30	meghanmane84/LLM-Manifold-Based-Compression-Techniques Research code for LLM Compression using Functional Algorithms, exploring...	15	Experimental	—	Jupyter Notebook
31	rubsj/ai-contrastive-embedding-finetuning Domain-specific embedding fine-tuning with contrastive learning and PEFT/LoRA	13	Experimental	—	HTML
32	IMSUVEN/wubba Wubba learns layout-invariant embeddings from raw HTML using contrastive...	13	Experimental	—	Python
33	quantumxiaol/activation_beacon fork from...	13	Experimental	—	Python
34	LCEmT/LCEmT Lossless Compression Techniques for Embedding Tables in Substantial Deep...	13	Experimental	—	C++
35	AparnaRoy76/Fine-Tune-Embedding-Model 🚀 Generate high-quality triplet datasets for job titles & skills, and...	13	Experimental	—	Jupyter Notebook
36	1kkiRen/Embeddings-Division Python script for dividing embedding layer of LLM.	13	Experimental	—	Python
37	Renatoelho/embeddings-consultas-similaridade Vou mostrar como converter textos simples em representações matemáticas...	13	Experimental	—	Python
38	daniau23/Fine_Tuning_LLMs_and_Embeddings Exploring the fine tuning of both LLMs and Embedding models.	13	Experimental	—	Jupyter Notebook
39	StepanTita/space-model Space Model framework that allows for maintaining generalizability, and...	13	Experimental	9	HTML
40	kushagraghosh/EuroSAT Trained a ResNet50 model on the EuroSAT satellite imagery dataset w/...	13	Experimental	—	Python
41	YoRzHe-HotaaRu/Learn-EmbedAIModel a quick way to learn and understand what AI Embedding Model are about.	12	Experimental	1	Python
42	Madhur-Chotia/LLMs-Mastery this repo contains LLM and NLP applications starting from how tokenisers are...	12	Experimental	5	Jupyter Notebook
43	cestella/kaffeeklatsch Higher Level Primitives for working with LLMs in Java	11	Experimental	—	Java
44	mmanela/llm-embeddings Clustering and labeling concepts using LLM Embeddings	11	Experimental	4	Python
45	uci-cv-genelab-bps-mouse-template/mouse-bps-labeler Use Active Learning to diversely sample the dataset and generate new labels...	11	Experimental	—	Python
46	yhbcode000/soft-rob-embedding Unifying the representation of robot statuses and actions with natural...	11	Experimental	—	Python
47	mattelim/interprexis-mit-6.8610-nlp InterpreXis: Finding Human-Interpretable Concepts Inside Contextual Word...	10	Experimental	2	Jupyter Notebook
48	sn2727/finetuning-embedding-models Domain adaption for an embedding model using unsupervised and supervised...	10	Experimental	2	Jupyter Notebook