skyzh/tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

57
/ 100
Established

This course helps systems engineers understand and implement the core components of large language model (LLM) inference serving. You'll learn how to build an efficient system from the ground up, taking raw LLM data and producing generated text responses. This is for systems engineers who want to dive deep into optimizing LLM deployment on Apple Silicon.

3,935 stars. Actively maintained with 4 commits in the last 30 days.

Use this if you are a systems engineer looking to gain hands-on experience and deep knowledge in building and optimizing LLM serving infrastructure, specifically on macOS environments.

Not ideal if you are an end-user simply looking to use an LLM or are a developer who prefers high-level APIs for LLM integration without understanding the underlying serving mechanics.

LLM deployment systems engineering machine learning infrastructure model serving performance optimization
No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 19 / 25

How are scores calculated?

Stars

3,935

Forks

286

Language

Python

License

Apache-2.0

Last pushed

Mar 06, 2026

Commits (30d)

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/skyzh/tiny-llm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.