tensorgi/TPA
[NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)
This project offers a sophisticated transformer model known as T6, which uses Tensor Product Attention (TPA) to improve performance and reduce memory usage in large language models. It provides the tools for preparing large datasets like Fineweb-Edu-100B and OpenWebText, pretraining the T6 model, and evaluating its capabilities. This is for machine learning researchers and engineers focused on developing and optimizing large language models.
450 stars.
Use this if you are developing or researching advanced transformer architectures and need to improve model performance while managing computational resources efficiently, especially when working with extensive text datasets.
Not ideal if you are looking for an off-the-shelf, ready-to-use language model for direct application without deep model development or research.
Stars
450
Forks
37
Language
Python
License
MIT
Category
Last pushed
Jan 26, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tensorgi/TPA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
huggingface/text-generation-inference
Large Language Model Text Generation Inference
OpenMachine-ai/transformer-tricks
A collection of tricks and tools to speed up transformer models
poloclub/transformer-explainer
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
IBM/TabFormer
Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)
lorenzorovida/FHE-BERT-Tiny
Source code for the paper "Transformer-based Language Models and Homomorphic Encryption: an...