casinca/LLM-quest

Verbose implementations of LLMs architectures, techniques and research papers from scratch. DeepSeek, Qwen3..., RLHF, MoE, Multimodal...

/ 100

Emerging

This project offers detailed, from-scratch implementations of various large language model (LLM) architectures and advanced techniques. It provides a transparent view of how complex LLMs like DeepSeek, Qwen3, and Gemma are built, along with methods for alignment (like RLHF) and multimodal capabilities. The resource is invaluable for AI researchers, machine learning engineers, and students who want to understand, experiment with, and learn the intricate mechanics behind state-of-the-art LLMs.

Use this if you are an AI researcher or machine learning engineer looking to deeply understand, reverse-engineer, and experiment with the internal workings of modern LLMs and their underlying techniques from first principles.

Not ideal if you are looking for an out-of-the-box LLM to use in an application, or if you need a high-level library for rapid prototyping without delving into the architectural details.

AI-research LLM-architecture machine-learning-engineering deep-learning AI-education

No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

Goekdeniz-Guelmez/mlx-lm-lora

Train Large Language Models on MLX.

uber-research/PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.

VHellendoorn/Code-LMs

Guide to using pre-trained large language models of source code

ssbuild/chatglm_finetuning

chatglm 6b finetuning and alpaca finetuning

jarobyte91/pytorch_beam_search

A lightweight implementation of Beam Search for sequence models in PyTorch.

Explore Transformer Models

All categories Trending Transformer directory Insights