gitkaz/mlx_gguf_server

This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously using multiprocessing.

50
/ 100
Established

This project helps developers serve multiple large language models (LLMs) on Apple Silicon Macs. It allows you to load various MLX or GGUF format models simultaneously and interact with them via a web API, processing text prompts for completions or chat, and even transcribing audio. It's designed for developers building applications that need to leverage different LLMs efficiently on macOS.

Use this if you are a developer building applications on an Apple Silicon Mac and need to host and manage multiple LLM models and transcribe audio efficiently via an API.

Not ideal if you need a production-ready, highly scalable LLM serving solution for non-Apple hardware, or if you are not comfortable with API-driven interactions.

LLM deployment MLX framework GGUF models API development Speech-to-text
No Package No Dependents
Maintenance 13 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

17

Forks

4

Language

Python

License

MIT

Last pushed

Mar 27, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/gitkaz/mlx_gguf_server"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.