gagolews/clustering-data-v1
A framework for benchmarking clustering algorithms – Benchmark suite, version 1
This project provides a collection of standardized datasets for evaluating how well different clustering algorithms perform. It takes raw, unlabeled numerical data and offers corresponding 'true' cluster assignments, allowing researchers to rigorously compare the accuracy and efficiency of various clustering methods. Data scientists and machine learning researchers use this to test and improve their clustering techniques.
No commits in the last 6 months.
Use this if you need reliable, diverse datasets with known ground-truth cluster labels to benchmark or develop new clustering algorithms.
Not ideal if you are looking for code to implement clustering algorithms or an automated tool for running benchmarks; this provides only the datasets.
Stars
9
Forks
1
Language
Jupyter Notebook
License
—
Category
Last pushed
May 21, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/gagolews/clustering-data-v1"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scikit-learn-contrib/hdbscan
A high performance implementation of HDBSCAN clustering.
annoviko/pyclustering
pyclustering is a Python, C++ data mining library.
panagiotisanagnostou/HiPart
Hierarchical divisive clustering algorithm execution, visualization and Interactive visualization.
erdogant/clusteval
Clusteval provides methods for unsupervised cluster validation
mqcomplab/MDANCE
MDANCE: O(N) clustering for molecular dynamics. Process 1.5M frames in 40min. 8 specialized algorithms.