jlamprou/Infini-Attention
Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M context keypass retrieval
This project offers a specialized toolkit for researchers and practitioners working with large language models, helping them process extremely long texts more efficiently. It takes in existing language models and training data, and outputs a modified model capable of understanding and generating responses based on much longer contexts than standard models, without prohibitive computational costs. It's designed for those pushing the boundaries of what LLMs can do with extensive information.
No commits in the last 6 months.
Use this if you are a researcher or advanced practitioner experimenting with large language models that need to process and understand very long documents or conversations, such as entire books or extensive codebases.
Not ideal if you need a production-ready solution for standard language model tasks or if you are not comfortable with experimental, research-stage code.
Stars
86
Forks
7
Language
Python
License
—
Category
Last pushed
May 09, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jlamprou/Infini-Attention"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
fla-org/flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models
thu-ml/SageAttention
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x...
thu-ml/SpargeAttn
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
fla-org/flame
🔥 A minimal training framework for scaling FLA models
foundation-model-stack/fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for...