sail-sg/Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
This project helps researchers and developers understand how 'attention sink' behavior emerges in large language models (LLMs) during pre-training. By providing tools to analyze factors like optimization, data, and architecture, it allows users to examine attention patterns in open-source LLMs or their own pre-trained models. The output helps machine learning researchers diagnose and interpret LLM training dynamics.
159 stars. No commits in the last 6 months.
Use this if you are a machine learning researcher or engineer actively pre-training or analyzing the internal workings of large language models and want to investigate the phenomenon of attention sink.
Not ideal if you are an end-user simply looking to apply or fine-tune existing large language models without delving into their pre-training mechanics.
Stars
159
Forks
5
Language
Python
License
MIT
Category
Last pushed
Jul 08, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sail-sg/Attention-Sink"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ZHZisZZ/dllm
dLLM: Simple Diffusion Language Modeling
pengzhangzhi/Open-dLLM
Open diffusion language model for code generation — releasing pretraining, evaluation,...
EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM...
THUDM/LongWriter
[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
AIoT-MLSys-Lab/SVD-LLM
[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2