mostlygeek/llama-swap

Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc

62
/ 100
Established

This tool helps AI application developers manage multiple local generative AI models on their machines efficiently. It acts as a smart traffic controller, taking in requests for various AI tasks (like text generation, image creation, or speech processing) and automatically routing them to the correct local AI model server. Developers building AI applications will find this useful for testing and deploying different models without manual intervention.

2,775 stars. Actively maintained with 20 commits in the last 30 days.

Use this if you need to run and seamlessly switch between different local generative AI models, such as those compatible with OpenAI or Anthropic APIs, for your applications.

Not ideal if you only ever use a single generative AI model and don't need to swap between them or manage multiple local AI servers.

AI application development local AI deployment model management generative AI AI server orchestration
No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

2,775

Forks

205

Language

Go

License

MIT

Last pushed

Mar 12, 2026

Commits (30d)

20

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mostlygeek/llama-swap"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.