Llm Cuda Optimization Transformer Models

There are 21 llm cuda optimization models tracked. 2 score above 50 (established tier). The highest-rated is quic/efficient-transformers at 58/100 with 87 stars.

Get all 21 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-cuda-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 quic/efficient-transformers

This library empowers users to seamlessly port pretrained models and...

58
Established
2 ManuelSLemos/RabbitLLM

Run 70B+ LLMs on a single 4GB GPU — no quantization required.

52
Established
3 alpa-projects/alpa

Training and serving large-scale neural networks with auto parallelization.

47
Emerging
4 arm-education/Advanced-AI-Hardware-Software-Co-Design

Hands-on course materials for ML engineers to master extreme model...

45
Emerging
5 IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up...

43
Emerging
6 deepreinforce-ai/CUDA-L2

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through...

42
Emerging
7 eqimp/hogwild_llm

Official PyTorch implementation for Hogwild! Inference: Parallel LLM...

38
Emerging
8 AutonomicPerfectionist/PipeInfer

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

36
Emerging
9 llcuda/llcuda

CUDA 12-first backend inference for Unsloth on Kaggle — Optimized for small...

35
Emerging
10 UIC-InDeXLab/RSR

An Efficient Matrix Multiplication Algorithm for Accelerating Inference in...

35
Emerging
11 CodingPlatelets/transformer_MM

Accelerator for LLM Based on Chisel3

33
Emerging
12 Bruce-Lee-LY/cutlass_gemm

Multiple GEMM operators are constructed with cutlass to support LLM inference.

32
Emerging
13 smvorwerk/xlstm-cuda

Cuda implementation of Extended Long Short Term Memory (xLSTM) with C++ and...

32
Emerging
14 liashchynskyi/ggufer

Convert & quantize HuggingFace models using llama.cpp on premises

30
Emerging
15 rockyco/estFreqOffset

LLM-Assisted FPGA Design for Carrier Frequency Offset Estimation

30
Emerging
16 JIA-Lab-research/Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration...

25
Experimental
17 ccs96307/fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding,...

22
Experimental
18 luckystar-pear/llm-compress

Compress context data to optimize memory and performance in C++ large...

22
Experimental
19 friedpotato04/CUDA-L2

🚀 Optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels using...

21
Experimental
20 moham94/mini-sglang

🚀 Harness mini-SGLang to power efficient inference for Large Language Models...

19
Experimental
21 amai-gsu/LM-Meter

Official code repo of SEC'25 paper: lm-Meter: Unveiling Runtime Inference...

18
Experimental