Domain-Specific Embeddings Embedding Tools

Task-specific embedding models and representations trained on specialized vocabularies, domains, or linguistic phenomena (legislation, events, topics, entities, skills). Does NOT include general-purpose pre-trained embeddings, embedding infrastructure, or domain-agnostic retrieval systems.

There are 95 domain-specific embeddings tools tracked. 2 score above 50 (established tier). The highest-rated is MilaNLProc/contextualized-topic-models at 58/100 with 1,266 stars.

Get all 95 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=domain-specific-embeddings&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 MilaNLProc/contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine...

58
Established
2 vinid/cade

Compass-aligned Distributional Embeddings. Align embeddings from different corpora

50
Established
3 spcl/ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics

48
Emerging
4 criteo-research/CausE

Code for the Recsys 2018 paper entitled Causal Embeddings for Recommandation.

48
Emerging
5 vintasoftware/entity-embed

PyTorch library for transforming entities like companies, products, etc....

48
Emerging
6 ina-foss/twembeddings

Sentence embeddings for unsupervised event detection in the Twitter stream:...

47
Emerging
7 bnosac/ruimtehol

R package to Embed All the Things! using StarSpace

45
Emerging
8 rodrigobressan/entity_embeddings_categorical

Discover relevant information about categorical data with entity embeddings...

44
Emerging
9 BaseModelAI/cleora

Cleora AI is a general-purpose open-source model for efficient, scalable...

43
Emerging
10 mop/bier

Cleaned up reference implementation of BIER: Boosting Independent Embeddings...

41
Emerging
11 uhh-lt/sensegram

Making sense embedding out of word embeddings using graph-based word sense induction

40
Emerging
12 tony-hong/event-embedding-multitask

*SEM 2018: Learning Distributed Event Representations with a Multi-Task Approach

40
Emerging
13 cpa-analytics/embedding-encoder

Scikit-Learn compatible transformer that turns categorical variables into...

38
Emerging
14 jxmorris12/cde

code for training & evaluating Contextual Document Embedding models

38
Emerging
15 WladimirSidorenko/SentiLex

Sentiment Lexicon Generation Suite

37
Emerging
16 wangjksjtu/multi-embedding-cws

Multiple Character Embeddings for Chinese Word Segmentation, ACL 2019

37
Emerging
17 bnosac/ETM

Topic Modelling in Semantic Embedding Spaces

37
Emerging
18 dustinstoltz/CMDist

DEPRECATED - The Concept Mover's Distance Method is now available in the...

37
Emerging
19 lfmatosm/embedded-topic-model

A package to run embedded topic modelling with ETM. Adapted from the...

37
Emerging
20 milangritta/Minimalist-Location-Metonymy-Resolution

The code and data accompanying the ACL 2017 "outstanding award" publication ...

36
Emerging
21 y3ro/meemi

Improving cross-lingual word embeddings by meeting in the middle

36
Emerging
22 dkn22/embedder

Embed categorical variables via neural networks.

34
Emerging
23 dustinstoltz/cartography_poetics

Reproduction Repository for "Cultural Cartography with Word Embeddings"

34
Emerging
24 kaushalshetty/Positional-Encoding

Encoding position with the word embeddings.

33
Emerging
25 marziehf/TS_Embeddings

Learning topic-sensitive word embeddings

33
Emerging
26 oentaryorj/smu.softeng.crossact

Cross-platform activity prediction

32
Emerging
27 arsena-k/discourse_atoms

How are topics encoded in semantic space? Repository to accompany PNAS...

32
Emerging
28 vgupta123/P-SIF

Source code for our AAAI 2020 paper P-SIF: Document Embeddings using...

32
Emerging
29 ikergarcia1996/MVM-Embeddings

A monolingual and cross-lingual meta-embedding generation framework

31
Emerging
30 skesiraju/smm

Subspace multinomial model for learning document representations

31
Emerging
31 dwulff/embedR

Generate and analyze state-of-the-art text embeddings

31
Emerging
32 ltgoslo/diachronic_armed_conflicts

Diachronic armed conflicts prediction from news texts

31
Emerging
33 garawalid/Multilingual-Unsupervised-Embeddings

Align two embeddings (EN - FR) using MUSE (Unsupervised)

31
Emerging
34 harmanpreet93/poincare-embedding-using-gensim

Train poincare embedding using gensim

30
Emerging
35 moonlockwood/BinaryNeuralNetwork

Tiny nn for experimenting with '8-hot' binary encoded embeddings

29
Experimental
36 BUTSpeechFIT/BaySMM

A Bayesian Multilingual Document Model

28
Experimental
37 cisnlp/MEXA

🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

28
Experimental
38 rug-compling/bimu

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

28
Experimental
39 arranger1044/spae

Code and supplemental material for "Sum-Product Autoencoding: Encoding and...

28
Experimental
40 armintabari/Emotional-Embedding

Retraining embedding models to incorporate emotional constraints.

28
Experimental
41 shuxiaobo/text-representation

Text representation works, such as : paper, code, review, datasets, blogs,...

27
Experimental
42 corradomonti/ideological-embeddings

Code and data for the CIKM2021 paper "Learning Ideological Embeddings From...

27
Experimental
43 MiuLab/GenDef

Probing task; contextual embeddings -> textual definitions (EMNLP19)

27
Experimental
44 stephantul/piecelearn

Learning BPE embeddings by first learning a segmentation model and then...

27
Experimental
45 zouharvi/pwesuite

Suite for phonetic word embeddings, especially their evaluation and baseline models.

27
Experimental
46 ChenghaoMou/embeddings

zero-vocab or low-vocab embeddings

27
Experimental
47 dalisson/am_softmax

This is a pytorch implementation of the am_softmax, this softmax layer...

27
Experimental
48 fursovia/geometric_embedding

"Zero-Training Sentence Embedding via Orthogonal Basis" paper implementation

26
Experimental
49 g-laz77/Cross-Lingual-Word-Embeddings

Learn a shared embedding space between words in multiple languages.

25
Experimental
50 rimonim/embedplyr

Tools for Working With Text Embeddings in R

25
Experimental
51 catalyst-cooperative/ccai-entity-matching

An exploration of generalizable approaches to unsupervised entity matching...

25
Experimental
52 gabmoreira/subspaces

Code for the paper Learning Visual-Semantic Subspace Representations

25
Experimental
53 yyaghoobzadeh/figment-multi

Multi-level Representations for Fine-Grained Typing of Knowledge Base Entities

25
Experimental
54 victor7246/MRF-LDA

MRF-LDA model for topic modelling

25
Experimental
55 junyachen/NPMM

A nonparametric model for online topic discovery with word embeddings

24
Experimental
56 manojsukhavasi/Unsupervised-Cross-Lingual-Embeddings

cross-lingual word embeddings with unsupervised learning

24
Experimental
57 pedrada88/relative

Repository to learn relation vectors from text corpora. Includes the...

24
Experimental
58 soliblue/Reddit-Politics

Code for a large-scale analysis of political subcommunities on Reddit,...

23
Experimental
59 vpuru98/Embeddings

Training Word Embeddings and using them to perform Sentiment Analysis with...

23
Experimental
60 JanEnglerRWTH/SensePOLAR

Code related to the project: SensePOLAR: Word sense aware interpretability...

23
Experimental
61 izhx/uni-rep

Code for embedding and retrieval research.

22
Experimental
62 kiudee/pareto-embeddings

Advanced choice modeling with multidimensional utility representations.

21
Experimental
63 jparkerweb/fast-topic-analysis

🏷️ Fast Topic Analysis is a tool for analyzing text against predefined...

21
Experimental
64 gabmoreira/subembed

Repository for the paper: Native Logical and Hierarchical Representations...

20
Experimental
65 centre-for-humanities-computing/embedding-projection

This is a repository for reproducing the results of Continuous sentiment...

20
Experimental
66 Riccorl/sense-embedding

BabelNet (and WordNet) sense embedding trained with Word2Vec and FastText

20
Experimental
67 do-me/embedding-algebra

Test scripts for common word embedding falsehoods like King - Man + Woman =...

20
Experimental
68 r2d4/blog-embeddings

Script to generate embeddings from a blog and use GPT-3.5 to categorize the...

20
Experimental
69 pedrada88/rwe

Repository containing data and code of the ACL-19 paper "Relational Word Embeddings"

19
Experimental
70 satya77/Entity_Embedding

Reference implementation of the paper "Word Embeddings for Entity-annotated Texts"

19
Experimental
71 csiro-robotics/MDL

🔥[IEEE TPAMI 2023] Official repository TPAMI 2023 paper "Exploiting Field...

19
Experimental
72 stannida/skill-embeddings

Repository for the Master Thesis "Encoding semantic information about skills...

18
Experimental
73 jdenes/TopicEmbeddings

An open-source framework to create and test document embeddings using topic models.

18
Experimental
74 yigitsever/Evaluating-Dictionary-Alignment

Code for the paper "Evaluating cross-lingual textual similarity on...

18
Experimental
75 amitkumarj441/CIKM2023_SubspaceEmbedding

Pluggable Embedding Code for our CIKM paper titled "Lightweight Adaptation...

17
Experimental
76 apostolidoum/modeling-behaviour-of-SoC-players

Code for my Diploma Thesis. The goal was to model the players' behavior by...

17
Experimental
77 thecml/neural_embedder

A small library that can encode categorical variables to entity embeddings...

17
Experimental
78 hanshanley/multilingual-matryoshka-news

GitHub Repo for the ACL 2025 Paper: Hierarchical Level-Wise News Article...

14
Experimental
79 tteofili/jtm

tool for extraction of topics from jira issues

14
Experimental
80 akshaychawla/Accelerated-Training-by-disentangling-neural-representations

Just a theory.

13
Experimental
81 slowwavesleep/FnSenseMapper

A tool to map FrameNet Lexical Units to BabelNet synsets using the distance...

13
Experimental
82 Develop-Packt/Deep-Learning-for-Text-embeddings

This module demonstrates the power of word embeddings and explains the...

13
Experimental
83 KlaraGtknst/text_topic

This repository implements a pipeline to store various data of files from a...

13
Experimental
84 cisnlp/ColexificationNet

Crosslingual Transfer Learning for Low-Resource Languages Based on...

12
Experimental
85 MarkBelford/co-association

Weighted Term Co-association approach for producing more coherent topics, a...

11
Experimental
86 RainBoltz/pySmore

A newly interpreted code of C++ project `SMORe`, which developed in Python...

11
Experimental
87 deborahdore/cross-lingual-embeddings

cross-lingual embeddings for French and Italian evaluated on machine...

11
Experimental
88 soheilabadifard/Query-Embeddings

This Script is part of LETOR Project

11
Experimental
89 milan-pavlovic-ai/representation-learning

Approaches for learning text representations

11
Experimental
90 francesita/CS-Embed-SemEval2020

Code and specs for CS-Embed's entry for SemEval-2020 Task-9. We present...

11
Experimental
91 MichalKal99/BrainEmbeddings

Code and data for Master Thesis

11
Experimental
92 hyenee/Syntax-Vector-Learning-using-correspondence-for-Natural-Language-Understanding

The GitHub repository for the paper "Syntax Vector Learning using...

11
Experimental
93 Teemursu/reddit_neologism_semantic_change

TWEC application for studying language change

10
Experimental
94 rodrigolourencofarinha/AI-Entity-Matching

Leverage AI to accurately match and reconcile entities across two datasets.

10
Experimental
95 JavierBJ/gender-politics-twitter

Scripts used for development of my Master's Thesis "Analyzing Twitter data...

10
Experimental