Llm Cuda Optimization Transformer Models

There are 21 llm cuda optimization models tracked. 2 score above 50 (established tier). The highest-rated is quic/efficient-transformers at 58/100 with 87 stars.

Get all 21 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-cuda-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	quic/efficient-transformers This library empowers users to seamlessly port pretrained models and...	58	Established	87	Python
2	ManuelSLemos/RabbitLLM Run 70B+ LLMs on a single 4GB GPU — no quantization required.	52	Established	38	Python
3	alpa-projects/alpa Training and serving large-scale neural networks with auto parallelization.	47	Emerging	3,188	Python
4	arm-education/Advanced-AI-Hardware-Software-Co-Design Hands-on course materials for ML engineers to master extreme model...	45	Emerging	34	Jupyter Notebook
5	IST-DASLab/marlin FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up...	43	Emerging	1,039	Python
6	deepreinforce-ai/CUDA-L2 CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through...	42	Emerging	472	Cuda
7	eqimp/hogwild_llm Official PyTorch implementation for Hogwild! Inference: Parallel LLM...	38	Emerging	140	Python
8	AutonomicPerfectionist/PipeInfer PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	36	Emerging	32	C++
9	llcuda/llcuda CUDA 12-first backend inference for Unsloth on Kaggle — Optimized for small...	35	Emerging	8	Jupyter Notebook
10	UIC-InDeXLab/RSR An Efficient Matrix Multiplication Algorithm for Accelerating Inference in...	35	Emerging	17	Python
11	CodingPlatelets/transformer_MM Accelerator for LLM Based on Chisel3	33	Emerging	12	Scala
12	Bruce-Lee-LY/cutlass_gemm Multiple GEMM operators are constructed with cutlass to support LLM inference.	32	Emerging	19	C++
13	smvorwerk/xlstm-cuda Cuda implementation of Extended Long Short Term Memory (xLSTM) with C++ and...	32	Emerging	91	C++
14	liashchynskyi/ggufer Convert & quantize HuggingFace models using llama.cpp on premises	30	Emerging	2	Jupyter Notebook
15	rockyco/estFreqOffset LLM-Assisted FPGA Design for Carrier Frequency Offset Estimation	30	Emerging	10	C++
16	JIA-Lab-research/Q-LLM This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration...	25	Experimental	55	Python
17	ccs96307/fast-llm-inference Accelerating LLM inference with techniques like speculative decoding,...	22	Experimental	11	Python
18	luckystar-pear/llm-compress Compress context data to optimize memory and performance in C++ large...	22	Experimental	—	C++
19	friedpotato04/CUDA-L2 🚀 Optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels using...	21	Experimental	—	Cuda
20	moham94/mini-sglang 🚀 Harness mini-SGLang to power efficient inference for Large Language Models...	19	Experimental	1	Python
21	amai-gsu/LM-Meter Official code repo of SEC'25 paper: lm-Meter: Unveiling Runtime Inference...	18	Experimental	1	—