skyzh/tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
This course helps systems engineers understand and implement the core components of large language model (LLM) inference serving. You'll learn how to build an efficient system from the ground up, taking raw LLM data and producing generated text responses. This is for systems engineers who want to dive deep into optimizing LLM deployment on Apple Silicon.
3,935 stars. Actively maintained with 4 commits in the last 30 days.
Use this if you are a systems engineer looking to gain hands-on experience and deep knowledge in building and optimizing LLM serving infrastructure, specifically on macOS environments.
Not ideal if you are an end-user simply looking to use an LLM or are a developer who prefers high-level APIs for LLM integration without understanding the underlying serving mechanics.
Stars
3,935
Forks
286
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 06, 2026
Commits (30d)
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/skyzh/tiny-llm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
AXERA-TECH/ax-llm
Explore LLM model deployment based on AXera's AI chips
AmpereComputingAI/ampere_model_library
AML's goal is to make benchmarking of various AI architectures on Ampere CPUs a pleasurable experience :)