Embedding Clustering Tools

Tools for clustering and organizing data (text, URLs, tables, time series) using embeddings and unsupervised/semi-supervised algorithms. Includes dimensionality reduction and clustering visualization. Does NOT include general semantic search, similarity matching, or domain-specific applications (recommendation systems, RAG, etc.).

There are 42 embedding clustering tools tracked. 3 score above 50 (established tier). The highest-rated is TorchDR/TorchDR at 58/100 with 199 stars.

Get all 42 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=embedding-clustering-tools&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 TorchDR/TorchDR

TorchDR - PyTorch Dimensionality Reduction

58
Established
2 derrickburns/generalized-kmeans-clustering

Production-ready K-Means clustering for Apache Spark with pluggable Bregman...

56
Established
3 abhilash1910/ClusterTransformer

Topic clustering library built on Transformer embeddings and cosine...

51
Established
4 md-experiments/picture_text

Interactive tree-maps with SBERT & Hierarchical Clustering (HAC)

49
Emerging
5 mainlp/semantic_components

Finding semantic components in your neural representations.

42
Emerging
6 scientist-labs/clusterkit

High-performance UMAP dimensionality reduction for Ruby, powered by the...

42
Emerging
7 nlpub/watset-java

An implementation of the Watset clustering algorithm in Java.

39
Emerging
8 abojchevski/rsc

Robust Spectral Clustering. Implementation of "Robust Spectral Clustering...

38
Emerging
9 kjpou1/regimetry

Unsupervised regime detection for financial time series using embeddings and...

37
Emerging
10 amazon-science/supervised-intent-clustering

This is a package to fine-tune language models in order to create...

36
Emerging
11 dcarpintero/taxonomy-completion

Taxonomy Completion with Embedding Quantization and an LLM-based Pipeline: A...

33
Emerging
12 demegire/eksi-cluster

Tool for clustering homonymous eksisozluk.com page entries

33
Emerging
13 VincentGaoHJ/Taxonomic-Relation-Identification

Awesome research paper on taxonomy (information retrieval). Study notes...

30
Emerging
14 uhh-lt/Taxonomy_Refinement_Embeddings

Taxonomy refinement method to improve domain-specific taxonomy systems.

27
Experimental
15 manickbhan/content-pruning-by-semantic-distance-topical-dilution

Visualize Page Embeddings for all Nodes on a Website

26
Experimental
16 jacobmarks/clustering-plugin

Compute clustering on your data in a visual, intuitive way with FiftyOne and Sklearn!

25
Experimental
17 molgenis/variable-taxon-mapper

A tool for mapping elements to a (biomedical) taxonomy

23
Experimental
18 houshuang/limbic

Embedding, search, novelty detection, and clustering for knowledge-dense...

23
Experimental
19 duanyu/embedding_application

Some applications of text embedding model, e.g., semantic retrieval and clustering.

23
Experimental
20 NoYo25/ClusteringTableHeaders

This project aims at creating an RDF schema given a list of column headers...

22
Experimental
21 FabienCadoret/autokluster

Auto-k spectral clustering for text embeddings

21
Experimental
22 Baho73/cluster-optimization

Text embedding clustering pipeline: outlier detection (KNN + LOF +...

21
Experimental
23 sahandv/science_science

A framework to analyze, visualize abd predict scientific trends

19
Experimental
24 esantus/Outlier_Detection

Data and code for the experiments in the Outlier Detection task proposed by...

19
Experimental
25 VieVie31/TAL_synonymy

trying some stuffs about synonymy and other NLP stuffs...

19
Experimental
26 amazon-science/frictional-utterances-clustering

This is a package to apply clustering algorithms to utterances, embedded...

18
Experimental
27 Marta-Barea/embeddings-clustering-songs-lyrics

Analyze and group song lyrics by semantic meaning using machine learning techniques.

17
Experimental
28 RubenBroekx/SemiSupervisedClustering

Cluster context-less embedded language data in a semi-supervised manner.

17
Experimental
29 emrecncelik/weighted-bert

Nonofficial implementation of the paper A Text Document Clustering Method...

17
Experimental
30 marsidmali/Roget-s-Thesaurus-in-the-21st-Century

An investigation into how modern machine learning techniques align with...

17
Experimental
31 haschka/semantic-trees

A repository for collaboration on semantic-trees

13
Experimental
32 sergeyklay/clusterium

Text Clustering Toolkit for Bayesian Nonparametric Analysis

13
Experimental
33 panos-span/rogets_thesaurus

Semantic clustering and classification of Roget's Thesaurus words

13
Experimental
34 Shiv33ndu/msgvault_exploration

Semantic grouping of archived emails built on top of the local email archive...

13
Experimental
35 tes69ducker/Image-Clustering-ML

🌟 Explore unsupervised image clustering with dynamic K-Means and Cosine...

13
Experimental
36 Guizinx/guilhermearthursantosmachado_Valida-odemodelosdeclusteriza-o-25E4_3-_pd

Clusterização de textos negativos com SBERT + K-Means/DBSCAN para apoio à moderação.

12
Experimental
37 ozlerhakan/keywords_clustering

cluster text data using sentence bert

12
Experimental
38 url-clusterer/white-paper

An implementation of a methodology to cluster dynamic URLs using word embeddings.

11
Experimental
39 sian0x0/Roud-Song-Clusters

Lyrics clustering

11
Experimental
40 arj1211/cluster-links

pipeline that extracts, cleans, embeds, and clusters web links into topical...

11
Experimental
41 jacksongrove/idea-generator

Research paper: Predictive Idea Generation Through Clustering Narrow Word Embeddings

10
Experimental
42 sakbarpu/Clustering_DimReduction

The implementations in this repository deal with clustering and...

10
Experimental