FareedKhan-dev/train-llama4
Building LLaMA 4 MoE from Scratch
This project guides you through building a simplified LLaMA 4-style language model from scratch. You'll start with raw text, process it, and then train a model to generate new, coherent text based on a given prompt. This is for machine learning engineers, researchers, or advanced students interested in understanding the inner workings of large language models, especially those using a Mixture-of-Experts (MoE) architecture.
No commits in the last 6 months.
Use this if you are a machine learning practitioner who wants to deeply understand the architectural components and training process of modern large language models, particularly the Mixture-of-Experts approach.
Not ideal if you are looking for a pre-trained model to use directly or a high-level library to fine-tune an existing model.
Stars
72
Forks
17
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Apr 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FareedKhan-dev/train-llama4"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
AI-Hypercomputer/maxtext
A simple, performant and scalable Jax LLM!
rasbt/reasoning-from-scratch
Implement a reasoning LLM in PyTorch from scratch, step by step
mindspore-lab/mindnlp
MindSpore + 🤗Huggingface: Run any Transformers/Diffusers model on MindSpore with seamless...
mosaicml/llm-foundry
LLM training code for Databricks foundation models
rickiepark/llm-from-scratch
<밑바닥부터 만들면서 공부하는 LLM>(길벗, 2025)의 코드 저장소