pmichel31415/are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

38
/ 100
Emerging

This project helps machine learning researchers understand the inner workings of large language models like BERT and Transformer models used for machine translation. By systematically removing or disabling parts of these models (individual 'attention heads'), it provides insights into how different components contribute to the model's overall performance on tasks like natural language understanding or translation. Researchers can use this to analyze model behavior and evaluate the impact of architectural choices.

175 stars. No commits in the last 6 months.

Use this if you are a machine learning researcher studying the interpretability or efficiency of Transformer-based models and want to reproduce or extend experiments on attention head ablation and pruning.

Not ideal if you are looking for a general-purpose natural language processing tool or want to train a new model from scratch without focusing on architectural analysis.

natural-language-processing machine-translation model-interpretability deep-learning-research transformer-models
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

175

Forks

15

Language

Shell

License

MIT

Last pushed

Apr 01, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/pmichel31415/are-16-heads-really-better-than-1"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.