sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

/ 100

Verified

This project helps developers and MLOps engineers efficiently deploy and manage large language and multimodal AI models. It takes trained AI models and hardware resources as input, then optimizes their performance to deliver faster and more cost-effective AI inference. It's designed for technical professionals building and operating AI-powered applications.

24,410 stars. Used by 5 other packages. Actively maintained with 994 commits in the last 30 days. Available on PyPI.

Use this if you need to serve large language models or multimodal models with high performance, low latency, and broad hardware compatibility.

Not ideal if you are looking for an off-the-shelf AI application or a framework for training models from scratch.

AI model deployment MLOps large language model serving multimodal AI inference GPU optimization

Maintenance 22 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 25 / 25

How are scores calculated?

Stars

24,410

Forks

4,799

Language

Python

License

Apache-2.0

Recent Releases

v0.5.10.post1 09 Apr 2026 v0.5.10 06 Apr 2026 v0.5.10rc0 28 Mar 2026 v0.5.9 24 Feb 2026 v0.5.8 23 Jan 2026

Compare

sglang and vllm sglang and LightLLM

Related models

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

tenstorrent/tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

Explore Transformer Models

All categories Trending Transformer directory Insights