lengstrom/flashback

A FlashAttention backwards-over-backwards ⚡🔙🔙

/ 100

Experimental

This project helps machine learning engineers and researchers accelerate advanced model training techniques for attention-based models like Transformers. It provides optimized components for calculating the 'backwards-over-backwards' pass, a specialized form of gradient computation. Input consists of attention mechanism tensors (Query, Key, Value), and the output provides highly efficient, memory-optimized second-order gradients, enabling faster research in areas like meta-learning and hyperparameter optimization.

No commits in the last 6 months.

Use this if you are a machine learning researcher or engineer experimenting with metalearning, hyperparameter optimization, or architecture search, and need to compute higher-order gradients for attention-based models more efficiently.

Not ideal if you only need standard first-order gradients or are working with models that don't extensively use attention mechanisms, as the specialized optimizations may not provide significant benefits.

deep-learning-research attention-mechanisms meta-learning hyperparameter-optimization gradient-descent-optimization

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

gpu-mode/Triton-Puzzles

Puzzles for learning Triton

hailo-ai/hailo_model_zoo

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

open-mmlab/mmdeploy

OpenMMLab Model Deployment Framework

hyperai/tvm-cn

TVM Documentation in Chinese Simplified / TVM 中文文档

Explore ML Frameworks

All categories Trending ML Framework directory Insights