mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
This project helps machine learning engineers efficiently deploy large language models (LLMs) across a wide range of devices and operating systems. You input a trained LLM, and it outputs an optimized, high-performance version that runs natively on various platforms like web browsers, mobile devices (iOS, Android), and different GPUs (NVIDIA, AMD, Apple, Intel). ML engineers who need their LLMs to run directly on end-user hardware, not just in the cloud, would use this.
22,185 stars. Actively maintained with 16 commits in the last 30 days.
Use this if you need to deploy a large language model to run directly on edge devices, mobile phones, or web browsers, ensuring high performance and broad compatibility.
Not ideal if you primarily deploy LLMs on cloud-based servers or don't require native, optimized performance across diverse hardware.
Stars
22,185
Forks
1,960
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 09, 2026
Commits (30d)
16
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mlc-ai/mlc-llm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Compare
Related models
PaddlePaddle/FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
skyzh/tiny-llm
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...
ServerlessLLM/ServerlessLLM
Serverless LLM Serving for Everyone.
AXERA-TECH/ax-llm
Explore LLM model deployment based on AXera's AI chips
AmpereComputingAI/ampere_model_library
AML's goal is to make benchmarking of various AI architectures on Ampere CPUs a pleasurable experience :)