Michael-A-Kuykendall/shimmy
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.
This tool helps developers run large language models (LLMs) like GPT-style models directly on their own computer, without needing to send data to external services. You provide it with a compatible model file (GGUF, SafeTensors), and it makes the model accessible through an interface that works just like OpenAI's API. This is ideal for developers building AI-powered applications who need local control, privacy, and cost-efficiency.
3,793 stars. Actively maintained with 2 commits in the last 30 days.
Use this if you are a developer building an AI application and want to use LLMs locally, ensuring privacy and avoiding external API costs, while still leveraging existing OpenAI-compatible code and tools.
Not ideal if you don't use LLMs in your development workflow or if your existing LLM infrastructure relies entirely on cloud-based OpenAI services and you don't need local inference.
Stars
3,793
Forks
292
Language
Rust
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Michael-A-Kuykendall/shimmy"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
ludwig-ai/ludwig
Low-code framework for building custom LLMs, neural networks, and other AI models
withcatai/node-llama-cpp
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...
mudler/LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...
zhudotexe/kani
kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)
SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.