src-d/kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA

/ 100

Established

This tool helps scientists, marketers, or other data practitioners quickly group large datasets into meaningful clusters and find the closest data points. You provide large tables of numerical data, and it outputs cluster assignments for each data point and identifies nearest neighbors efficiently. It is designed for anyone working with very large datasets who needs fast clustering and nearest-neighbor search.

841 stars. No commits in the last 6 months.

Use this if you need to perform K-means clustering or K-nearest neighbors search on massive datasets and have access to NVIDIA GPUs for significantly faster processing.

Not ideal if you do not have NVIDIA GPUs, as its core performance advantage relies on CUDA acceleration, or if your data contains many missing (NaN) values when using the faster 'Yinyang' algorithm.

data-mining customer-segmentation image-recognition bioinformatics pattern-recognition

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 24 / 25

How are scores calculated?

Stars

841

Forks

146

Language

Jupyter Notebook

License

—

Related frameworks

scikit-learn-contrib/hdbscan

A high performance implementation of HDBSCAN clustering.

annoviko/pyclustering

pyclustering is a Python, C++ data mining library.

panagiotisanagnostou/HiPart

Hierarchical divisive clustering algorithm execution, visualization and Interactive visualization.

erdogant/clusteval

Clusteval provides methods for unsupervised cluster validation

mqcomplab/MDANCE

MDANCE: O(N) clustering for molecular dynamics. Process 1.5M frames in 40min. 8 specialized algorithms.

Explore ML Frameworks

All categories Trending ML Framework directory Insights