tobna/TaylorShift
This repository contains the code for the paper "TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax"
This project offers a way for machine learning engineers and researchers to build more efficient Transformer and Vision Transformer models. It provides a specialized attention mechanism that processes sequence or image data faster, especially with long inputs. You input your existing model architecture and data, and it outputs a more performant model.
Use this if you are a machine learning engineer or researcher working with Transformer or Vision Transformer models and need to reduce computational complexity or improve performance, especially with large datasets or long sequences.
Not ideal if you are not a developer working with PyTorch for deep learning, or if you need a complete, out-of-the-box solution for a specific application.
Stars
13
Forks
—
Language
Python
License
MIT
Category
Last pushed
Feb 25, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/tobna/TaylorShift"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
philipperemy/keras-attention
Keras Attention Layer (Luong and Bahdanau scores).
tatp22/linformer-pytorch
My take on a practical implementation of Linformer for Pytorch.
datalogue/keras-attention
Visualizing RNNs using the attention mechanism
ematvey/hierarchical-attention-networks
Document classification with Hierarchical Attention Networks in TensorFlow. WARNING: project is...
thushv89/attention_keras
Keras Layer implementation of Attention for Sequential models