tensorgi/TPA

[NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)

/ 100

Established

This project offers a sophisticated transformer model known as T6, which uses Tensor Product Attention (TPA) to improve performance and reduce memory usage in large language models. It provides the tools for preparing large datasets like Fineweb-Edu-100B and OpenWebText, pretraining the T6 model, and evaluating its capabilities. This is for machine learning researchers and engineers focused on developing and optimizing large language models.

450 stars.

Use this if you are developing or researching advanced transformer architectures and need to improve model performance while managing computational resources efficiently, especially when working with extensive text datasets.

Not ideal if you are looking for an off-the-shelf, ready-to-use language model for direct application without deep model development or research.

large-language-models natural-language-processing deep-learning-research transformer-models model-optimization

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

450

Forks

Language

Python

License

MIT

Related models

huggingface/text-generation-inference

Large Language Model Text Generation Inference

OpenMachine-ai/transformer-tricks

A collection of tricks and tools to speed up transformer models

poloclub/transformer-explainer

Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization

IBM/TabFormer

Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

lorenzorovida/FHE-BERT-Tiny

Source code for the paper "Transformer-based Language Models and Homomorphic Encryption: an...

Explore Transformer Models

All categories Trending Transformer directory Insights