src-d/kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
This tool helps scientists, marketers, or other data practitioners quickly group large datasets into meaningful clusters and find the closest data points. You provide large tables of numerical data, and it outputs cluster assignments for each data point and identifies nearest neighbors efficiently. It is designed for anyone working with very large datasets who needs fast clustering and nearest-neighbor search.
841 stars. No commits in the last 6 months.
Use this if you need to perform K-means clustering or K-nearest neighbors search on massive datasets and have access to NVIDIA GPUs for significantly faster processing.
Not ideal if you do not have NVIDIA GPUs, as its core performance advantage relies on CUDA acceleration, or if your data contains many missing (NaN) values when using the faster 'Yinyang' algorithm.
Stars
841
Forks
146
Language
Jupyter Notebook
License
—
Category
Last pushed
Oct 11, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/src-d/kmcuda"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
scikit-learn-contrib/hdbscan
A high performance implementation of HDBSCAN clustering.
annoviko/pyclustering
pyclustering is a Python, C++ data mining library.
panagiotisanagnostou/HiPart
Hierarchical divisive clustering algorithm execution, visualization and Interactive visualization.
erdogant/clusteval
Clusteval provides methods for unsupervised cluster validation
mqcomplab/MDANCE
MDANCE: O(N) clustering for molecular dynamics. Process 1.5M frames in 40min. 8 specialized algorithms.