pmichel31415/are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
This project helps machine learning researchers understand the inner workings of large language models like BERT and Transformer models used for machine translation. By systematically removing or disabling parts of these models (individual 'attention heads'), it provides insights into how different components contribute to the model's overall performance on tasks like natural language understanding or translation. Researchers can use this to analyze model behavior and evaluate the impact of architectural choices.
175 stars. No commits in the last 6 months.
Use this if you are a machine learning researcher studying the interpretability or efficiency of Transformer-based models and want to reproduce or extend experiments on attention head ablation and pruning.
Not ideal if you are looking for a general-purpose natural language processing tool or want to train a new model from scratch without focusing on architectural analysis.
Stars
175
Forks
15
Language
Shell
License
MIT
Category
Last pushed
Apr 01, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/pmichel31415/are-16-heads-really-better-than-1"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
huggingface/transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in...
kyegomez/LongNet
Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
pbloem/former
Simple transformer implementation from scratch in pytorch. (archival, latest version on codeberg)
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
kyegomez/SimplifiedTransformers
SimplifiedTransformer simplifies transformer block without affecting training. Skip connections,...