skyzh/tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

/ 100

Established

This course helps systems engineers understand and implement the core components of large language model (LLM) inference serving. You'll learn how to build an efficient system from the ground up, taking raw LLM data and producing generated text responses. This is for systems engineers who want to dive deep into optimizing LLM deployment on Apple Silicon.

3,935 stars. Actively maintained with 4 commits in the last 30 days.

Use this if you are a systems engineer looking to gain hands-on experience and deep knowledge in building and optimizing LLM serving infrastructure, specifically on macOS environments.

Not ideal if you are an end-user simply looking to use an LLM or are a developer who prefers high-level APIs for LLM integration without understanding the underlying serving mechanics.

LLM deployment systems engineering machine learning infrastructure model serving performance optimization

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 19 / 25

How are scores calculated?

Stars

3,935

Forks

286

Language

Python

License

Apache-2.0

Related models

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

AXERA-TECH/ax-llm

Explore LLM model deployment based on AXera's AI chips

AmpereComputingAI/ampere_model_library

AML's goal is to make benchmarking of various AI architectures on Ampere CPUs a pleasurable experience :)

Explore Transformer Models

All categories Trending Transformer directory Insights