FareedKhan-dev/train-llama4

Building LLaMA 4 MoE from Scratch

45
/ 100
Emerging

This project guides you through building a simplified LLaMA 4-style language model from scratch. You'll start with raw text, process it, and then train a model to generate new, coherent text based on a given prompt. This is for machine learning engineers, researchers, or advanced students interested in understanding the inner workings of large language models, especially those using a Mixture-of-Experts (MoE) architecture.

No commits in the last 6 months.

Use this if you are a machine learning practitioner who wants to deeply understand the architectural components and training process of modern large language models, particularly the Mixture-of-Experts approach.

Not ideal if you are looking for a pre-trained model to use directly or a high-level library to fine-tune an existing model.

large-language-models natural-language-generation model-architecture deep-learning-research mixture-of-experts
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 15 / 25
Community 19 / 25

How are scores calculated?

Stars

72

Forks

17

Language

Jupyter Notebook

License

MIT

Last pushed

Apr 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FareedKhan-dev/train-llama4"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.