Beomi/InfiniTransformer

Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

/ 100

Emerging

This project helps AI researchers and machine learning engineers train advanced language models like Llama3 and Gemma with significantly longer context windows, going from typical lengths to millions of tokens. It takes existing model architectures and training data, producing more capable models that can understand and generate much longer texts while using less memory on powerful GPUs. This is for professionals building or fine-tuning large language models.

375 stars. No commits in the last 6 months.

Use this if you need to train Llama3 or Gemma models to process extremely long documents or conversations without running out of memory, or if you want to extend their contextual understanding capabilities.

Not ideal if you are working with standard context lengths or do not have access to high-end GPUs, as the full memory benefits are most apparent with very long sequences.

large-language-models deep-learning-training natural-language-processing generative-ai model-optimization

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

375

Forks

Language

Python

License

MIT

Higher-rated alternatives

lucidrains/x-transformers

A concise but complete full-attention transformer with a set of promising experimental features...

kanishkamisra/minicons

Utility for behavioral and representational analyses of Language Models

lucidrains/simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

lucidrains/dreamer4

Implementation of Danijar's latest iteration for his Dreamer line of work

Nicolepcx/Transformers-in-Action

This is the corresponding code for the book Transformers in Action

Explore Transformer Models

All categories Trending Transformer directory Insights