Text Clustering Topic Modeling Transformer Models

Tools for unsupervised discovery and organization of text documents through clustering, dimensionality reduction, and topic extraction using transformer embeddings. Does NOT include supervised text classification, document retrieval/search, or general semantic similarity tasks.

There are 22 text clustering topic modeling models tracked. 1 score above 70 (verified tier). The highest-rated is MaartenGr/BERTopic at 71/100 with 7,443 stars. 1 of the top 10 are actively maintained.

Get all 22 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=text-clustering-topic-modeling&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 MaartenGr/BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

71
Verified
2 webis-de/small-text

Active Learning for Text Classification in Python

67
Established
3 mead-ml/mead-baseline

Deep-Learning Model Exploration and Development for NLP

49
Emerging
4 x-tabdeveloping/turftopic

Robust and fast topic models with sentence-transformers.

47
Emerging
5 HumanSignal/label-studio-transformers

Label data using HuggingFace's transformers and automatically get a...

46
Emerging
6 hiyouga/Dual-Contrastive-Learning

Code for our paper "Dual Contrastive Learning: Text Classification via...

45
Emerging
7 kmaurinjones/AllMeans

Automatic topic modelling using minimal external input and computational resources

33
Emerging
8 hsisaberi/single-trait-electra

A complete ELECTRA-based framework for Big Five personality trait...

30
Emerging
9 DarshanAdiga/idiom-principle-on-magpie-corpus

Idiom Principle on MAGPIE dataset

30
Emerging
10 nerdimite/bert-finetuning-webinar

Code for the FullStack AI Live Coding Series- Part 1 (CellStrat AI Lab)

28
Experimental
11 WeskerPRO/NLP_Project

Fine-tuning BERT and BART for sentiment analysis, paraphrase detection, and...

26
Experimental
12 ai-center-kth/cuBERT-source-code-clustering

Fine-tuning cuBERT embeddings for clustering source code by functionality

25
Experimental
13 ia-labo/French-News-Clustering

Text classification and clustering using transformers and Denstream.

24
Experimental
14 LennartKeller/DeepTextClustering

Deep text clustering with language models

24
Experimental
15 fork123aniket/Zero-Shot-Question-Answering

Implementation of Zero-Shot Question Answering in PyTorch

18
Experimental
16 nolnolon/User-Clustering-with-BERT-Models

User Clustering Pipelines with BERT Models on Long and Heterogeneous Tweets...

17
Experimental
17 tre-systems/cefr-workshop

Educational workshop for NLP engineers. Fine-tuning DeBERTa-v3 for CEFR...

17
Experimental
18 mgiorgi13/MITopics

Topic detection to identify the main topics on MIT management papers

13
Experimental
19 battles5/amelia-bertino-legal-nlp

Legal Argument Mining on Italian tax-court decisions (AMELIA dataset) ...

13
Experimental
20 eriknovak/WAC

The Wasserstein distance-based news Article Clustering algorithm

11
Experimental
21 LazerLambda/udl-negation

Comparing Data-Driven Techniques for Enhancing Negation Sensitivity in...

11
Experimental
22 whoamimi/ClusteringInLatentSpace

Latent space experiments

10
Experimental

Comparisons in this category