HPC Cluster Management ML Frameworks
Resources, guides, and tools for setting up, configuring, and managing HPC clusters and distributed computing infrastructure for ML workloads. Does NOT include general cloud computing platforms, containerization tools, or ML frameworks themselves.
There are 33 hpc cluster management frameworks tracked. 1 score above 70 (verified tier). The highest-rated is qualcomm/ai-hub-models at 72/100 with 940 stars. 1 of the top 10 are actively maintained.
Get all 33 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=hpc-cluster-management&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
qualcomm/ai-hub-models
Qualcomm® AI Hub Models is our collection of state-of-the-art machine... |
|
Verified |
| 2 |
petuum/adaptdl
Resource-adaptive cluster scheduler for deep learning training. |
|
Established |
| 3 |
zszazi/Deep-learning-in-cloud
List of Deep Learning Cloud Providers |
|
Established |
| 4 |
lincc-frameworks/hyrax
Hyrax - A low-code framework for rapid experimentation with ML &... |
|
Established |
| 5 |
intel/ai-reference-models
Intel® AI Reference Models: contains Intel optimizations for running deep... |
|
Established |
| 6 |
openhackathons-org/gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI |
|
Established |
| 7 |
HydroRoll-Team/HydroRoll
跨平台、多任务、高度自定义的骰系开发框架。 |
|
Emerging |
| 8 |
HPCNow/hpcnow-labs
HPCNow! training material and hands-on sessions |
|
Emerging |
| 9 |
pescap/EasyHPC
A practical introduction to High Performance Computing (HPC) |
|
Emerging |
| 10 |
ray-project/ray-acm-workshop-2023
Scalable/Distributed Computer Vision with Ray |
|
Emerging |
| 11 |
binga/cloud-gpus
This repository contains information about Cloud GPU offerings for Machine... |
|
Emerging |
| 12 |
opencomputeproject/ocp-diag-windtunnel
Building & testing private AI on HPC. |
|
Emerging |
| 13 |
debnsuma/ray-for-developers
A comprehensive hands-on guide to building production-grade distributed... |
|
Emerging |
| 14 |
hkust-hpc-team/hkust-hpc
Handbook for AI / HPC users on HKUST central clusters |
|
Emerging |
| 15 |
knagrecha/hydra
Execution framework for multi-task model parallelism. Enables the training... |
|
Emerging |
| 16 |
onlyrobot/bray
Bray is based on Ray and outperforms Ray in practical distributed... |
|
Emerging |
| 17 |
Roulbac/uv-func
A Python decorator to run functions in isolated virtual environments... |
|
Emerging |
| 18 |
Skyld-Labs/ModelHunter
ModelHunter is a powerful pipeline designed to extract machine learning... |
|
Experimental |
| 19 |
hydra-hoard/hydra
A decentralised application that creates high quality machine learning datasets |
|
Experimental |
| 20 |
uw-mad-dash/shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic... |
|
Experimental |
| 21 |
jonathandinu/spark-ray-data-science
Supporting content (slides and exercises) for the Pearson video series... |
|
Experimental |
| 22 |
parisimaa/NYU-HPC
NYU HPC user instruction |
|
Experimental |
| 23 |
gpu-cli/zerostart
Fast cold starts for GPU Python. Streaming wheel extraction for when large... |
|
Experimental |
| 24 |
breadboardfoundry/GPU-Infrastructure
GPU compute infrastructure for research teams running machine learning experiments. |
|
Experimental |
| 25 |
SupreethRao99/slurmy
template scripts and notes for using SLURM on Nvidia DGX GPU cluster |
|
Experimental |
| 26 |
alifzl/NeSI-Project-Template
NeSI HPC DL project Scaffolding Template |
|
Experimental |
| 27 |
smirko-dev/machine-learning-rpi
Setup ML for Raspberry Pi |
|
Experimental |
| 28 |
erectbranch/enroot-on-slurm
Examples of using Enroot with Slurm for distributed deep learning |
|
Experimental |
| 29 |
Adhytm/multi-gpu-debug-notes
Debugging and isolating GPU context preemption issus in heterogeneous... |
|
Experimental |
| 30 |
RichardScottOZ/experimenta-ml-kiro
experimenta-ml for kiro-cli |
|
Experimental |
| 31 |
settadev/setta
Streamline Python coding, configuration, UI creation, and onboarding. |
|
Experimental |
| 32 |
Akshay3510/Hydra
🔍 Develop advanced knowledge compilers and #SAT solvers with Hydra, a robust... |
|
Experimental |
| 33 |
drai-inn/uoa-ai-gpu-docs
University of Auckland AI GPU cluster support pages |
|
Experimental |