tomerjann/what-happens-when-you-prompt
A deep-dive reference tracing every layer of the stack when you send a prompt to an LLM chat, from keystroke to streamed token. Covers tokenization, KV cache, prefill/decode, sampling, SSE streaming, and more.
This reference guide explains the intricate journey of a prompt sent to an LLM chat, from the moment a user types a character to the final token displayed on screen. It details every technical step in the process, including tokenization, prefill/decode phases, and streaming, providing a comprehensive understanding of what happens "under the hood." The ideal user is an engineer who builds or maintains applications that interact with large language models and seeks a deeper, production-oriented intuition.
Use this if you are an engineer working with LLMs and want to understand the complete technical stack and workflow involved in processing a prompt and generating a response.
Not ideal if you are a beginner looking for an introduction to transformers or RAG, or if you are a non-technical user curious about how LLMs work at a high level.
Stars
9
Forks
—
Language
—
License
—
Category
Last pushed
Mar 21, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/tomerjann/what-happens-when-you-prompt"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BoundaryML/baml
The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go...
deanpeters/product-manager-prompts
A repository of Generative AI prompts for product managers using agents such as ChatGPT, Claude, & Gemini
eudk/awesome-ai-tools
🔴 VERY LARGE AI TOOL LIST! 🔴 Curated list of AI Tools - Updated 2026
jujumilk3/leaked-system-prompts
Collection of leaked system prompts
legeling/PromptHub
一款开源、纯本地 Prompt ,Skill 管理工具,帮助你高效管理、版本控制和复用 Prompt,并一键分发skill | An open-source, local-first AI...