gotzmann/booster

Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

/ 100

Emerging

This project helps AI developers and engineers deploy large language models (LLMs) like LLaMA and Mistral more efficiently. It takes your pre-trained LLM and configuration settings, then serves it up for faster text generation and inference, even on less powerful hardware. You would use this if you're building applications that rely on LLMs and need to optimize their performance and scalability in production.

167 stars. No commits in the last 6 months.

Use this if you need to run large language models reliably and with high performance, whether on powerful GPUs or more modest CPU-only machines, without the complexities of Python dependencies.

Not ideal if you are looking for a platform to train new LLMs from scratch or a no-code solution for integrating AI into your applications.

LLM deployment AI inference MLOps language model optimization production AI

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

167

Forks

Language

C++

License

—

Higher-rated alternatives

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights