jinbooooom/ai-infra-hpc

hpc 教程，包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等

/ 100

Established

This resource provides comprehensive tutorials and summaries for AI-Infra and High-Performance Computing (HPC) technologies, focusing on the underlying systems, multi-GPU communication, parallel computing, and optimization for training and inference. It covers topics like CUDA programming, collective communication (MPI, NCCL), and hardware interconnects (RDMA). This is for AI infrastructure engineers, HPC specialists, and deep learning researchers looking to optimize and scale their AI models.

321 stars.

Use this if you need to understand or implement advanced techniques for accelerating AI model training and inference on GPU clusters, focusing on low-level hardware interaction and parallel programming.

Not ideal if you are an AI practitioner looking for high-level frameworks or pre-built solutions without needing to delve into the intricate details of system architecture or GPU programming.

AI infrastructure GPU programming High-Performance Computing Distributed AI training Deep learning optimization

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

321

Forks

Language

Cuda

License

MIT

Related tools

vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

SemiAnalysisAI/InferenceX

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X...

sophgo/tpu-mlir

Machine learning compiler based on MLIR for Sophgo TPU.

uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache...

Explore LLM Tools

All categories Trending LLM Tool directory Insights