Embedding Evaluation Benchmarks Embedding Tools

Tools and frameworks for evaluating, testing, and benchmarking embedding models across various dimensions (quality, stress-testing, cross-lingual performance). Does NOT include embedding generation, pre-trained models, or domain-specific embedding applications.

There are 64 embedding evaluation benchmarks tools tracked. 1 score above 70 (verified tier). The highest-rated is embeddings-benchmark/mteb at 86/100 with 3,159 stars. 1 of the top 10 are actively maintained.

Get all 64 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=embedding-evaluation-benchmarks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 embeddings-benchmark/mteb

MTEB: Massive Text Embedding Benchmark

86
Verified
2 harmonydata/harmony

The Harmony Python library: a research tool for psychologists to harmonise...

56
Established
3 yannvgn/laserembeddings

LASER multilingual sentence embeddings as a pip package

52
Established
4 embeddings-benchmark/results

Data for the MTEB leaderboard

51
Established
5 Hironsan/awesome-embedding-models

A curated list of awesome embedding models tutorials, projects and communities.

48
Emerging
6 fresh-stack/freshstack

This repository helps you evaluate your models on the FreshStack benchmark!

46
Emerging
7 SeanLee97/AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and...

46
Emerging
8 MilaNLProc/honest

A Python package to compute HONEST, a score to measure hurtful sentence...

46
Emerging
9 etalab-ia/mediatech

Collection of public datasets from the French administration, vectorized and...

45
Emerging
10 autonomio/signs

A suite of tools for text preparation, vectorization and processing for deep...

44
Emerging
11 plasticityai/magnitude

A fast, efficient universal vector embedding utility package.

44
Emerging
12 ricsinaruto/dialog-eval

Evaluate your dialog model with 17 metrics! (see paper)

44
Emerging
13 bheinzerling/bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)

44
Emerging
14 flipz357/S3BERT

Semantically Structured Sentence Embeddings

43
Emerging
15 isaacus-dev/mleb

The code used to evaluate embedding models on the Massive Legal Embedding...

43
Emerging
16 MaxwellRebo/awesome-2vec

Curated list of 2vec-type embedding models

43
Emerging
17 wangyuxinwhy/uniem

unified embedding model

43
Emerging
18 IndicoDataSolutions/Enso

Enso: An Open Source Library for Benchmarking Embeddings + Transfer Learning Methods

39
Emerging
19 encord-team/ebind

A 5-way embedding model for text, audio, image, video, and 3D point clouds.

38
Emerging
20 janluke/embfile

A package for reading/writing files containing pre-trained word embeddings...

38
Emerging
21 DeepK/hoDMD-experiments

EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition

36
Emerging
22 isaacus-dev/open-australian-legal-embeddings-creator

The code used to create and update the Open Australian Legal Embeddings, the...

36
Emerging
23 ikergarcia1996/MetaVec

A monolingual and cross-lingual meta-embedding generation and evaluation framework

35
Emerging
24 vered1986/NC_embeddings

Comparison between various noun compound embeddings

34
Emerging
25 sberdevices/saf_vectorizers

Плагин для SmartApp Framework, осуществляющий векторизацию (получение...

33
Emerging
26 EloiZ/embedding_evaluation

Evaluate your word embeddings

32
Emerging
27 sileod/embcomp

Composition of embeddings

31
Emerging
28 jfilter/hyperhyper

🧮 Python package to construct word embeddings for small data using PMI and SVD

31
Emerging
29 louisbrulenaudet/tax-retrieval-benchmark

An implementation of the TaxRetrievalBenchmark task for the 🤗 Massive Text...

31
Emerging
30 yanaiela/easyEmbed

downloading pre-trained embedding easily and keeping only the necessary...

30
Emerging
31 Sandipan99/POLAR

The POLAR Framework: polar Opposites Enable Interpretability of Pre-Trained...

30
Emerging
32 Hanscal/textembedding

计算文本相似度时经常需要用到的算法包

28
Experimental
33 neural-dialogue-metrics/EmbeddingBased

Embedding-based evaluation metrics for dialogue generation.

27
Experimental
34 rafalposwiata/pl-mteb

PL-MTEB: Polish Massive Text Embedding Benchmark

27
Experimental
35 ClimSocAna/tecb-de

German Text Embedding Clustering Benchmark

27
Experimental
36 eifuentes/awesome-embeddings

🪁A curated list of awesome resources around entity embeddings

27
Experimental
37 AbdulSametTurkmenoglu/embedding_compare

Embedding Model Comparison for Turkish Medical Texts

26
Experimental
38 semvec/embedstresstest

Stress Testing Embedding Models

26
Experimental
39 guenthermi/table-embeddings

Tools for training schema-aware Web table embedding for unsupervised and...

26
Experimental
40 paithiov909/apportita

Utility for handling ‘magnitude’ pretrained word embeddings

23
Experimental
41 Paulescu/text-embedding-evaluation

Join 15k builders to the Real-World ML Newsletter ⬇️⬇️⬇️

23
Experimental
42 MukundaKatta/EmbedBench

Embedding model comparison toolkit — benchmark TF-IDF, BoW, n-gram...

22
Experimental
43 TonioDominguez/dungeons_and_pythons_embeddings

Particular adaptación de juegos de rol basados en texto con tecnología NLP...

22
Experimental
44 kushmadlani/embedtrics

Word embedding evaluation package for word similarity, word analogies & word...

22
Experimental
45 s1mb1o/epg-embedding-benchmark

Evaluating sentence embedding models for cross-lingual TV program guide...

22
Experimental
46 dali-does/vse-probing

Code for COLING2020 paper: Probing Multimodal Embeddings for Linguistic Properties.

21
Experimental
47 France-Travail/embcompare

A simple python tool for embedding comparison

20
Experimental
48 busycaesar/Embeddings_And_Cosine_Similarity

Code for the presentation.

20
Experimental
49 BYU-PCCL/regexv

Regex using word embeddings for text matching

20
Experimental
50 abhimishra91/corpus-creator

This tool can be used to create a word corpus from locally available...

19
Experimental
51 iamtatsuki05/MIREI

MIREI is a research workspace that builds encoder/decoder text-embedding...

18
Experimental
52 OctaviusLeo/rag-lite-tfidf-eval

AI/SWE

17
Experimental
53 alecokas/subword-embedding

A tool for generating sub-word (phone or grapheme) level embeddings from an...

17
Experimental
54 inkrement/StuffedTurkey

Distributed Embedding Aggregation

17
Experimental
55 aravpanwar/Embedding_Comparision

This repository provides a framework to benchmark the performance and...

13
Experimental
56 tahsinkoc/test-embrix-experimental

Comprehensive benchmark suite for evaluating embedding model performance...

13
Experimental
57 metawake/awesome-text-embeddings

A curated list of text embedding models, benchmarks, and tools for semantic...

13
Experimental
58 apistemic/benchmarks

LLM Benchmarks for everyday business intelligence tasks / company data:...

12
Experimental
59 matthieu-perso/semantic_geometry

[EMNLP Findings 2025] Semantic Geometry of Sentence Embedding

11
Experimental
60 lh0x00/embs

embs is a Python toolkit for retrieving documents (via Docsifer), generating...

11
Experimental
61 SkBlaz/core

Compressibility of document representations

11
Experimental
62 nicolay-r/arekit-contrib-networks

This is a networks-related contributional component [tokenizer, embeddings,...

11
Experimental
63 ymgw55/Norm-and-Variance

Norm of Mean Contextualized Embeddings Determines their Variance (Published...

11
Experimental
64 hadi-abdine/WordEmbeddingsEvalFLUE

The used code in order to perform evaluation of word embeddings on FLUE benchmark.

10
Experimental