FareedKhan-dev/qwen3-MoE-from-scratch
A Step-by-Step Implementation of Qwen 3 MoE Architecture from Scratch
This project offers a hands-on guide to building a Mixture-of-Experts (MoE) large language model from scratch, specifically the Qwen 3 architecture. It takes you from raw text through tokenization, embedding, attention mechanisms, and the MoE routing process to ultimately predicting the next word. The ideal user is a machine learning engineer or researcher who wants to deeply understand the inner workings of state-of-the-art LLMs.
No commits in the last 6 months.
Use this if you want to understand the intricate components of a modern, efficient large language model like Qwen 3 MoE by building one yourself, focusing on the architectural details rather than just using pre-built APIs.
Not ideal if you're looking for a pre-trained model to use for inference or fine-tuning, or if you don't have a basic understanding of neural networks and Transformer architecture.
Stars
76
Forks
9
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Aug 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FareedKhan-dev/qwen3-MoE-from-scratch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
EfficientMoE/MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
raymin0223/mixture_of_recursions
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation...
AviSoori1x/makeMoE
From scratch implementation of a sparse mixture of experts language model inspired by Andrej...
thu-nics/MoA
[CoLM'25] The official implementation of the paper
jaisidhsingh/pytorch-mixtures
One-stop solutions for Mixture of Expert modules in PyTorch.