fla-org/flame

🔥 A minimal training framework for scaling FLA models

/ 100

Established

This project provides a training framework for creating highly efficient large language models, specifically those using Flash Linear Attention (FLA). It takes raw text datasets, like the FineWeb-Edu corpus, and outputs a trained language model ready for use in various applications. It's designed for machine learning researchers and engineers focused on developing custom, performant language models.

355 stars.

Use this if you are building and training your own large language models with a focus on high efficiency and scalability, especially when working with massive text datasets.

Not ideal if you're looking to simply fine-tune existing, pre-trained models or if you don't need to train models from scratch on large-scale datasets.

large-language-models model-training natural-language-processing machine-learning-engineering deep-learning

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

355

Forks

Language

Python

License

MIT

Compare

flame and flash-linear-attention

Related models

fla-org/flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x...

thu-ml/SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

foundation-model-stack/fms-fsdp

🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for...

NX-AI/mlstm_kernels

Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.

Explore Transformer Models

All categories Trending Transformer directory Insights