FareedKhan-dev/qwen3-MoE-from-scratch

A Step-by-Step Implementation of Qwen 3 MoE Architecture from Scratch

/ 100

Emerging

This project offers a hands-on guide to building a Mixture-of-Experts (MoE) large language model from scratch, specifically the Qwen 3 architecture. It takes you from raw text through tokenization, embedding, attention mechanisms, and the MoE routing process to ultimately predicting the next word. The ideal user is a machine learning engineer or researcher who wants to deeply understand the inner workings of state-of-the-art LLMs.

No commits in the last 6 months.

Use this if you want to understand the intricate components of a modern, efficient large language model like Qwen 3 MoE by building one yourself, focusing on the architectural details rather than just using pre-built APIs.

Not ideal if you're looking for a pre-trained model to use for inference or fine-tuning, or if you don't have a basic understanding of neural networks and Transformer architecture.

large-language-models deep-learning-architecture natural-language-processing machine-learning-engineering AI-research

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 9 / 25

Maturity 15 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

EfficientMoE/MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

raymin0223/mixture_of_recursions

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation...

AviSoori1x/makeMoE

From scratch implementation of a sparse mixture of experts language model inspired by Andrej...

thu-nics/MoA

[CoLM'25] The official implementation of the paper

jaisidhsingh/pytorch-mixtures

One-stop solutions for Mixture of Expert modules in PyTorch.

Explore Transformer Models

All categories Trending Transformer directory Insights