Beomi/InfiniTransformer
Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
This project helps AI researchers and machine learning engineers train advanced language models like Llama3 and Gemma with significantly longer context windows, going from typical lengths to millions of tokens. It takes existing model architectures and training data, producing more capable models that can understand and generate much longer texts while using less memory on powerful GPUs. This is for professionals building or fine-tuning large language models.
375 stars. No commits in the last 6 months.
Use this if you need to train Llama3 or Gemma models to process extremely long documents or conversations without running out of memory, or if you want to extend their contextual understanding capabilities.
Not ideal if you are working with standard context lengths or do not have access to high-end GPUs, as the full memory benefits are most apparent with very long sequences.
Stars
375
Forks
33
Language
Python
License
MIT
Category
Last pushed
Apr 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Beomi/InfiniTransformer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
lucidrains/x-transformers
A concise but complete full-attention transformer with a set of promising experimental features...
kanishkamisra/minicons
Utility for behavioral and representational analyses of Language Models
lucidrains/simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
lucidrains/dreamer4
Implementation of Danijar's latest iteration for his Dreamer line of work
Nicolepcx/Transformers-in-Action
This is the corresponding code for the book Transformers in Action