Pretrained Embedding Models Embedding Tools

Tools and implementations for loading, extracting, and utilizing pre-trained language model embeddings (BERT, ELMo, GloVe, RoBERTa, etc.). Does NOT include embedding APIs, vector databases, downstream applications like semantic search, or domain-specific embedding use cases.

There are 48 pretrained embedding models tools tracked. 1 score above 70 (verified tier). The highest-rated is MinishLab/model2vec at 74/100 with 2,008 stars. 1 of the top 10 are actively maintained.

Get all 48 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=pretrained-embedding-models&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 MinishLab/model2vec

Fast State-of-the-Art Static Embeddings

74
Verified
2 AnswerDotAI/ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

55
Established
3 tensorflow/hub

A library for transfer learning by reusing parts of TensorFlow models.

51
Established
4 Embedding/Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

51
Established
5 twang2218/vocab-coverage

语言模型中文认知能力分析

50
Established
6 Santosh-Gupta/SpeedTorch

Library for faster pinned CPU <-> GPU transfer in Pytorch

49
Emerging
7 MinishLab/tokenlearn

Pre-train Static Word Embeddings

47
Emerging
8 AliOsm/simplerepresentations

Easy-to-use text representations extraction library based on the...

43
Emerging
9 pdasigi/onto-lstm

Keras implementation of ontology aware token embeddings

42
Emerging
10 jasonwei20/eda_nlp

Data augmentation for NLP, presented at EMNLP 2019

42
Emerging
11 setu4993/convert-labse-tf-pt

Convert LaBSE model from TF Hub to PyTorch.

41
Emerging
12 ltgoslo/simple_elmo

Simple library to work with pre-trained ELMo models in TensorFlow

40
Emerging
13 PlanTL-GOB-ES/lm-spanish

Official source for spanish Language Models and resources made @ BSC-TEMU...

40
Emerging
14 YC-wind/embedding_study

中文预训练模型生成字向量学习,测试BERT,ELMO的中文效果

37
Emerging
15 davidberenstein1957/fast-sentence-transformers

Simply, faster, sentence-transformers

36
Emerging
16 Riccorl/transformers-embedder

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

36
Emerging
17 MoleculeTransformers/smiles-featurizers

Extract Molecular SMILES embeddings from language models pre-trained with...

36
Emerging
18 fsxfreak/nlp-augment

A collection of utilities used in exploring data augmentation of...

35
Emerging
19 siddk/relation-network

Tensorflow Implementation of Relation Networks for the bAbI QA Task,...

34
Emerging
20 Textualization/Ropherta

Compute RoBERTa embeddings in PHP using ONNX framework.

33
Emerging
21 milistu/bertdistiller

Faster, smaller BERT models in just a few lines of code.

32
Emerging
22 windsuzu/Joint-Semantic-Phonetic-Embedding

We use phonetics as a feature to create a joint semantic-phonetic embedding...

32
Emerging
23 Textualization/sentence-transphormers

Compute RoBERTa sentence embeddings in PHP using ONNX framework

31
Emerging
24 WenchenLi/capricorn

nlp vocabulary builder and embedding loader

29
Experimental
25 SpydazWebAI-NLP/SpydazWebAI_NLP_Models

Word/Image/Audio Embedding models, Tokenizer models, Ngram language models,...

29
Experimental
26 ksm26/Understanding-and-Applying-Text-Embeddings

Dive into the world of text embeddings. This course will guide you through...

28
Experimental
27 rbitr/ferrite

Simple, lightweight transformers in Fortran

27
Experimental
28 greninja/NPLM

Neural Network for word embeddings and Language Model

27
Experimental
29 sz128/pretrained_word_embeddings

It is about how to load and aggregate pretrained word embeddings in pytorch,...

27
Experimental
30 agadetsky/pytorch-definitions

[ACL 2018] Conditional Generators of Words Definitions

26
Experimental
31 ruanchaves/elmo

Supporting code for the paper "Portuguese Language Models and Word...

25
Experimental
32 rahmanidashti/pretrain-lightfm

Pre-train Embedding in LightFM Recommender System Framework

25
Experimental
33 vliu15/elmo-kmeans

GPU-accelerated Topic Analysis pipeline

25
Experimental
34 vliu15/qanet

Tensorflow QANet with ELMo

24
Experimental
35 MayankSingh-coder/octopus-prime

Perceptron-based neural models with tokenization, embeddings, and a minimal...

24
Experimental
36 jina-ai/embedding-fingerprints

Identify which embedding model produced a vector using digit-level...

24
Experimental
37 EsterHlav/Quantitative-Comparison-NLP-Embeddings-from-GloVe-to-RoBERTa

Fair quantitative comparison of NLP embeddings from GloVe to RoBERTa with...

23
Experimental
38 rcarmo/asterisk-embedding-model

A small text embedding model for low-resource hardware

23
Experimental
39 HenryNdubuaku/pete

Parameter-efficient transformer embeddings replace learned embeddings with...

22
Experimental
40 ada-k/LanguageModels

pretrained transformer and embeddings language models

21
Experimental
41 dayyass/muse_tf2pt

Convert MUSE from TensorFlow to PyTorch and ONNX

21
Experimental
42 smpanaro/ModernBERT-AppleNeuralEngine

ModernBERT model optimized for Apple Neural Engine.

21
Experimental
43 Repmak/sentenCPP

C++20 library designed to replicate the functionality and ease of use of the...

21
Experimental
44 dataiku/dss-plugin-nlp-embedding

Dataiku DSS plugin to extract vector embeddings from text data 👾

21
Experimental
45 chmcbs/chinese-noun-embeddings

An analysis of how encoder transformer models represent Chinese nouns,...

17
Experimental
46 andreabac3/Word_Alignment_BERT

This project provide an API to perform word alignment

13
Experimental
47 Ailing-Zou/Bert-embedding

To get bert embedding from text

11
Experimental
48 mariusjohan/BertEmbeddings

Quickly generate positional embeddings using an ultra small transformer...

11
Experimental