ybubnov/metalchat
Pure C++23 Llama inference for Apple Silicon chips
This is a C++ library and command-line tool that lets developers integrate Meta Llama AI models directly into their applications or run them from the terminal. It's specifically built for Apple Silicon chips, allowing local inference of large language models. The end-user persona is a software developer creating applications for macOS or iOS, or a power user who wants to interact with Llama models on their Apple machine without cloud services.
Use this if you are a developer building an application for Apple Silicon and need to embed or utilize Llama language models directly on the user's device for local, private, or offline AI capabilities.
Not ideal if you are an end-user without programming experience looking for a ready-to-use desktop application for Llama models, or if you need to deploy models on non-Apple hardware.
Stars
19
Forks
—
Language
C++
License
GPL-3.0
Category
Last pushed
Mar 04, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ybubnov/metalchat"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...