abhilash1910/ClusterTransformer
Topic clustering library built on Transformer embeddings and cosine similarity metrics.Compatible with all BERT base transformers from huggingface.
This tool helps data scientists and NLP practitioners organize unstructured text data into meaningful groups. You input a list of sentences, and it outputs a structured dataset (a dataframe) that assigns each sentence to a specific topic or cluster. This is ideal for anyone who needs to identify underlying themes in large collections of text, like customer feedback or research papers.
No commits in the last 6 months. Available on PyPI.
Use this if you need to automatically categorize or find common themes within a collection of text data, without manually defining the categories beforehand.
Not ideal if you already have predefined categories for your text and simply need to classify new texts into those existing labels.
Stars
44
Forks
15
Language
Python
License
—
Category
Last pushed
Jun 11, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/abhilash1910/ClusterTransformer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
TorchDR/TorchDR
TorchDR - PyTorch Dimensionality Reduction
derrickburns/generalized-kmeans-clustering
Production-ready K-Means clustering for Apache Spark with pluggable Bregman divergences (KL,...
md-experiments/picture_text
Interactive tree-maps with SBERT & Hierarchical Clustering (HAC)
mainlp/semantic_components
Finding semantic components in your neural representations.
scientist-labs/clusterkit
High-performance UMAP dimensionality reduction for Ruby, powered by the annembed Rust crate....