gotzmann/booster
Booster - open accelerator for LLM models. Better inference and debugging for AI hackers
This project helps AI developers and engineers deploy large language models (LLMs) like LLaMA and Mistral more efficiently. It takes your pre-trained LLM and configuration settings, then serves it up for faster text generation and inference, even on less powerful hardware. You would use this if you're building applications that rely on LLMs and need to optimize their performance and scalability in production.
167 stars. No commits in the last 6 months.
Use this if you need to run large language models reliably and with high performance, whether on powerful GPUs or more modest CPU-only machines, without the complexities of Python dependencies.
Not ideal if you are looking for a platform to train new LLMs from scratch or a no-code solution for integrating AI into your applications.
Stars
167
Forks
11
Language
C++
License
—
Category
Last pushed
Aug 15, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/gotzmann/booster"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...