LLM Inference Engines LLM Tools

High-performance inference frameworks and engines optimized for deploying and serving LLMs efficiently across various hardware accelerators and resource-constrained devices. Does NOT include LLM training frameworks, fine-tuning tools, or application-level chatbot/UI wrappers.

There are 29 llm inference engines tools tracked. 1 score above 70 (verified tier). The highest-rated is vllm-project/vllm-ascend at 73/100 with 1,773 stars. 6 of the top 10 are actively maintained.

Get all 29 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-inference-engines&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 vllm-project/vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

73
Verified
2 SemiAnalysisAI/InferenceX

Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS -...

69
Established
3 kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by...

69
Established
4 uccl-project/uccl

UCCL is an efficient communication library for GPUs, covering collectives,...

68
Established
5 sophgo/tpu-mlir

Machine learning compiler based on MLIR for Sophgo TPU.

68
Established
6 BBuf/how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

55
Established
7 RightNow-AI/picolm

Run a 1-billion parameter LLM on a $10 board with 256MB RAM

52
Established
8 jinbooooom/ai-infra-hpc

hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等

51
Established
9 zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

49
Emerging
10 RayFernando1337/LLM-Calc

Instantly calculate the maximum size of quantized language models that can...

46
Emerging
11 erans/selfhostllm

A web-based calculator for estimating GPU memory requirements and maximum...

42
Emerging
12 amirgholami/ai_and_memory_wall

AI and Memory Wall

42
Emerging
13 bd4sur/Nano

电子鹦鹉 / Toy Language Model

38
Emerging
14 ChiefGyk3D/FrankenLLM

Stitched-together GPUs, but it lives! Run different LLM models optimally...

36
Emerging
15 FilipFan/PolyEngineInfer

Run LLM inference in an Android app with llama.cpp, ExecuTorch, LiteRT,...

34
Emerging
16 Alex188dot/GPU-VRAM-Calculator

A simple tool to find out GPU VRAM requirements for running LLMs

31
Emerging
17 refinefuture-ai/refft.cpp

A new approach of running LLM/LMs' inference/training on GPU/NPU backends...

27
Experimental
18 hofong428/Optimizing-GPU-Kernels

LLM Serving & Inference Optimization

25
Experimental
19 PrajwalNeeralagi/nano-vllm

🚀 Implement fast offline inference with Nano-vLLM, a lightweight and...

24
Experimental
20 George614/gpu-mem-calculator

GPU Memory Calculator for LLM Training - Calculate GPU memory requirements...

24
Experimental
21 darekhta/marmot

High-performance LLM inference engine in C23 with CPU and Metal backends,...

22
Experimental
22 r3tr056/loc-ai-ly

Locaily - Making Large Language Model Inference Accessible on Consumer Hardware

22
Experimental
23 manishklach/SRMIC_X1

Analytical simulator for SRMIC — a residency-first LLM inference accelerator...

22
Experimental
24 simar-rekhi/triton

LLM-assisted compiler pass generation with Triton & CUDA

22
Experimental
25 Jugurthakebaili1/vLLM-Kunlun

🛠 Enhance vLLM performance on Kunlun XPU with this hardware plugin, offering...

21
Experimental
26 elibutters/CascadeInference

Cascade based inference for LLMs

13
Experimental
27 LessUp/hetero-paged-infer

PagedAttention + Continuous Batching Inference Engine Prototype (Rust):...

13
Experimental
28 MetaxisResearch/parallax

Distributed inference across heterogeneous hardware.

11
Experimental
29 rahulunair/xpu_tgi

TGI server setup for Intel Data Centre GPUs

11
Experimental