Zzzxkxz/cuda-fp8-ampere
🚀 Accelerate FP8 GEMM tasks on RTX 3090 Ti using lightweight storage and efficient tensor cores for high throughput without native FP8 support.
Stars
—
Forks
—
Language
Cuda
License
MIT
Category
Last pushed
Mar 19, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Zzzxkxz/cuda-fp8-ampere"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
d9d-project/d9d
d9d - d[istribute]d - distributed training framework based on PyTorch that tries to be efficient...
microsoft/nnscaler
nnScaler: Compiling DNN models for Parallel Training
Scottcjn/exo-cuda
Exo distributed inference with NVIDIA CUDA support via tinygrad
nirw4nna/dsc
Tensor library & inference framework for machine learning
Wasisange/cuda-kernels-collection
Custom CUDA kernels for optimized tensor operations in deep learning.