jinbooooom/ai-infra-hpc
hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等
This resource provides comprehensive tutorials and summaries for AI-Infra and High-Performance Computing (HPC) technologies, focusing on the underlying systems, multi-GPU communication, parallel computing, and optimization for training and inference. It covers topics like CUDA programming, collective communication (MPI, NCCL), and hardware interconnects (RDMA). This is for AI infrastructure engineers, HPC specialists, and deep learning researchers looking to optimize and scale their AI models.
321 stars.
Use this if you need to understand or implement advanced techniques for accelerating AI model training and inference on GPU clusters, focusing on low-level hardware interaction and parallel programming.
Not ideal if you are an AI practitioner looking for high-level frameworks or pre-built solutions without needing to delve into the intricate details of system architecture or GPU programming.
Stars
321
Forks
32
Language
Cuda
License
MIT
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jinbooooom/ai-infra-hpc"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
SemiAnalysisAI/InferenceX
Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...
sophgo/tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
uccl-project/uccl
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...