Michael-A-Kuykendall/shimmy

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

/ 100

Established

This tool helps developers run large language models (LLMs) like GPT-style models directly on their own computer, without needing to send data to external services. You provide it with a compatible model file (GGUF, SafeTensors), and it makes the model accessible through an interface that works just like OpenAI's API. This is ideal for developers building AI-powered applications who need local control, privacy, and cost-efficiency.

3,793 stars. Actively maintained with 2 commits in the last 30 days.

Use this if you are a developer building an AI application and want to use LLMs locally, ensuring privacy and avoiding external API costs, while still leveraging existing OpenAI-compatible code and tools.

Not ideal if you don't use LLMs in your development workflow or if your existing LLM infrastructure relies entirely on cloud-based OpenAI services and you don't need local inference.

AI-development local-inference LLM-deployment API-integration developer-tools

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 19 / 25

How are scores calculated?

Stars

3,793

Forks

292

Language

Rust

License

Apache-2.0

Related models

ludwig-ai/ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

withcatai/node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...

mudler/LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...

zhudotexe/kani

kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)

SciSharp/LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

Explore Transformer Models

All categories Trending Transformer directory Insights