brontoguana/krasis
Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware
Krasis helps AI practitioners run very large language models (LLMs) — those too big for typical consumer graphics cards — on their existing hardware. You provide the large LLM files, and Krasis processes them to allow efficient text generation and understanding on a single or a few NVIDIA GPUs. This is for AI developers, researchers, and data scientists who want to experiment with or deploy state-of-the-art LLMs without needing expensive, specialized server infrastructure.
Use this if you need to run massive language models (like 80B to 200B+ parameters) on a single consumer-grade NVIDIA GPU or a small cluster, and you want excellent performance without sacrificing too much quality.
Not ideal if you don't have an NVIDIA GPU, or if your primary need is to run smaller models that already fit comfortably in your GPU's VRAM.
Stars
52
Forks
6
Language
Rust
License
—
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/brontoguana/krasis"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EricLBuehler/mistral.rs
Fast, flexible LLM inference
nerdai/llms-from-scratch-rs
A comprehensive Rust translation of the code from Sebastian Raschka's Build an LLM from Scratch book.
ShelbyJenkins/llm_utils
llm_utils: Basic LLM tools, best practices, and minimal abstraction.
Mattbusel/llm-wasm
LLM inference primitives for WebAssembly — cache, retry, routing, guards, cost tracking, templates
GoWtEm/llm-model-selector
A high-performance Rust utility that analyzes your system hardware to recommend the optimal LLM...