FareedKhan-dev/qwen3-MoE-from-scratch

A Step-by-Step Implementation of Qwen 3 MoE Architecture from Scratch

39
/ 100
Emerging

This project offers a hands-on guide to building a Mixture-of-Experts (MoE) large language model from scratch, specifically the Qwen 3 architecture. It takes you from raw text through tokenization, embedding, attention mechanisms, and the MoE routing process to ultimately predicting the next word. The ideal user is a machine learning engineer or researcher who wants to deeply understand the inner workings of state-of-the-art LLMs.

No commits in the last 6 months.

Use this if you want to understand the intricate components of a modern, efficient large language model like Qwen 3 MoE by building one yourself, focusing on the architectural details rather than just using pre-built APIs.

Not ideal if you're looking for a pre-trained model to use for inference or fine-tuning, or if you don't have a basic understanding of neural networks and Transformer architecture.

large-language-models deep-learning-architecture natural-language-processing machine-learning-engineering AI-research
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 15 / 25
Community 13 / 25

How are scores calculated?

Stars

76

Forks

9

Language

Jupyter Notebook

License

MIT

Last pushed

Aug 05, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FareedKhan-dev/qwen3-MoE-from-scratch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.