tomerjann/what-happens-when-you-prompt

A deep-dive reference tracing every layer of the stack when you send a prompt to an LLM chat, from keystroke to streamed token. Covers tokenization, KV cache, prefill/decode, sampling, SSE streaming, and more.

19
/ 100
Experimental

This reference guide explains the intricate journey of a prompt sent to an LLM chat, from the moment a user types a character to the final token displayed on screen. It details every technical step in the process, including tokenization, prefill/decode phases, and streaming, providing a comprehensive understanding of what happens "under the hood." The ideal user is an engineer who builds or maintains applications that interact with large language models and seeks a deeper, production-oriented intuition.

Use this if you are an engineer working with LLMs and want to understand the complete technical stack and workflow involved in processing a prompt and generating a response.

Not ideal if you are a beginner looking for an introduction to transformers or RAG, or if you are a non-technical user curious about how LLMs work at a high level.

LLM-infrastructure prompt-engineering API-integration system-architecture AI-application-development
No License No Package No Dependents
Maintenance 13 / 25
Adoption 5 / 25
Maturity 1 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

License

Last pushed

Mar 21, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/tomerjann/what-happens-when-you-prompt"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.