harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
This resource provides comprehensive notes and a course on building and optimizing large language model (LLM) inference frameworks. It covers everything from transformer model structures and LLM quantization to advanced inference optimization and high-performance computing using technologies like Triton and CUDA. The ideal user is a machine learning engineer or researcher focused on deploying and speeding up LLM applications.
866 stars.
Use this if you are a machine learning engineer or researcher looking to deeply understand and implement efficient, high-performance inference for large language models.
Not ideal if you are a data scientist or developer primarily interested in using existing LLM APIs or off-the-shelf libraries without delving into the underlying optimization details.
Stars
866
Forks
87
Language
Python
License
—
Category
Last pushed
Dec 10, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/harleyszhang/llm_note"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NX-AI/xlstm
Official repository of the xLSTM.
sinanuozdemir/oreilly-hands-on-gpt-llm
Mastering the Art of Scalable and Efficient AI Model Deployment
DashyDashOrg/pandas-llm
Pandas-LLM
wxhcore/bumblecore
An LLM training framework built from the ground up, featuring a custom BumbleBee architecture...
MiniMax-AI/MiniMax-01
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model &...