jlamprou/Infini-Attention

Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M context keypass retrieval

/ 100

Experimental

This project offers a specialized toolkit for researchers and practitioners working with large language models, helping them process extremely long texts more efficiently. It takes in existing language models and training data, and outputs a modified model capable of understanding and generating responses based on much longer contexts than standard models, without prohibitive computational costs. It's designed for those pushing the boundaries of what LLMs can do with extensive information.

No commits in the last 6 months.

Use this if you are a researcher or advanced practitioner experimenting with large language models that need to process and understand very long documents or conversations, such as entire books or extensive codebases.

Not ideal if you need a production-ready solution for standard language model tasks or if you are not comfortable with experimental, research-stage code.

large-language-models natural-language-processing long-context-AI AI-research text-generation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x...

thu-ml/SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

fla-org/flame

🔥 A minimal training framework for scaling FLA models

foundation-model-stack/fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for...

Explore Transformer Models

All categories Trending Transformer directory Insights