James-QiuHaoran/LLM-serving-with-proxy-models

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)

/ 100

Emerging

This project helps operations engineers optimize the performance of large language model (LLM) serving systems. It predicts how long an LLM will take to respond to a user's query, allowing the system to handle requests more efficiently. The input is a user query, and the output is a prediction of the LLM's response length, which a scheduler uses to improve overall system throughput and reduce wait times. LLM system administrators and cloud engineers who manage these services would use this.

No commits in the last 6 months.

Use this if you are running an LLM inference service and want to reduce user wait times and improve system efficiency without changing your core memory or cache management.

Not ideal if you are a data scientist looking for a general-purpose LLM or an application developer integrating LLMs into your product, as this focuses on infrastructure optimization.

LLM-operations cloud-infrastructure system-optimization latency-reduction resource-scheduling

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights