pmichel31415/are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

/ 100

Emerging

This project helps machine learning researchers understand the inner workings of large language models like BERT and Transformer models used for machine translation. By systematically removing or disabling parts of these models (individual 'attention heads'), it provides insights into how different components contribute to the model's overall performance on tasks like natural language understanding or translation. Researchers can use this to analyze model behavior and evaluate the impact of architectural choices.

175 stars. No commits in the last 6 months.

Use this if you are a machine learning researcher studying the interpretability or efficiency of Transformer-based models and want to reproduce or extend experiments on attention head ablation and pruning.

Not ideal if you are looking for a general-purpose natural language processing tool or want to train a new model from scratch without focusing on architectural analysis.

natural-language-processing machine-translation model-interpretability deep-learning-research transformer-models

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

175

Forks

Language

Shell

License

MIT

Higher-rated alternatives

huggingface/transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in...

kyegomez/LongNet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

pbloem/former

Simple transformer implementation from scratch in pytorch. (archival, latest version on codeberg)

NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

kyegomez/SimplifiedTransformers

SimplifiedTransformer simplifies transformer block without affecting training. Skip connections,...

Explore Transformer Models

All categories Trending Transformer directory Insights