Scottcjn/exo-cuda
Exo distributed inference with NVIDIA CUDA support via tinygrad
This helps AI engineers and researchers run large language models on NVIDIA GPUs and across multiple servers, which is essential for working with powerful models that require significant computational resources. You provide the model and data, and it distributes the processing across available GPUs to give you faster responses. It's designed for anyone needing to deploy and scale large language model inference using NVIDIA hardware.
Use this if you need to run large language models on NVIDIA GPUs, especially when you have multiple GPUs or servers and want to combine their power for faster processing.
Not ideal if you are exclusively using Apple Silicon (MLX) or prefer not to use NVIDIA hardware for your language model inference.
Stars
43
Forks
6
Language
Python
License
GPL-3.0
Category
Last pushed
Mar 08, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Scottcjn/exo-cuda"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
d9d-project/d9d
d9d - d[istribute]d - distributed training framework based on PyTorch that tries to be efficient...
microsoft/nnscaler
nnScaler: Compiling DNN models for Parallel Training
nirw4nna/dsc
Tensor library & inference framework for machine learning
Zzzxkxz/cuda-fp8-ampere
🚀 Accelerate FP8 GEMM tasks on RTX 3090 Ti using lightweight storage and efficient tensor cores...
Wasisange/cuda-kernels-collection
Custom CUDA kernels for optimized tensor operations in deep learning.