FareedKhan-dev/train-llama4

Building LLaMA 4 MoE from Scratch

/ 100

Emerging

This project guides you through building a simplified LLaMA 4-style language model from scratch. You'll start with raw text, process it, and then train a model to generate new, coherent text based on a given prompt. This is for machine learning engineers, researchers, or advanced students interested in understanding the inner workings of large language models, especially those using a Mixture-of-Experts (MoE) architecture.

No commits in the last 6 months.

Use this if you are a machine learning practitioner who wants to deeply understand the architectural components and training process of modern large language models, particularly the Mixture-of-Experts approach.

Not ideal if you are looking for a pre-trained model to use directly or a high-level library to fine-tune an existing model.

large-language-models natural-language-generation model-architecture deep-learning-research mixture-of-experts

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 9 / 25

Maturity 15 / 25

Community 19 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

AI-Hypercomputer/maxtext

A simple, performant and scalable Jax LLM!

rasbt/reasoning-from-scratch

Implement a reasoning LLM in PyTorch from scratch, step by step

mindspore-lab/mindnlp

MindSpore + 🤗Huggingface: Run any Transformers/Diffusers model on MindSpore with seamless...

mosaicml/llm-foundry

LLM training code for Databricks foundation models

rickiepark/llm-from-scratch

<밑바닥부터 만들면서 공부하는 LLM>(길벗, 2025)의 코드 저장소

Explore Transformer Models

All categories Trending Transformer directory Insights