hao-ai-lab/DistCA

Efficient Long-context Language Model Training by Core Attention Disaggregation

/ 100

Emerging

This system helps AI researchers and deep learning engineers train large language models (LLMs) more efficiently, especially when dealing with very long input texts. It takes your LLM training setup and data, and produces a faster, more scalable training process, allowing you to build more capable models without excessive hardware or time. It is designed for those pushing the boundaries of what LLMs can understand.

Use this if you are training large language models with extremely long input contexts and are struggling with slow training times, workload imbalances across GPUs, or high communication overhead.

Not ideal if you are working with shorter context lengths or do not require highly distributed training across many GPUs, as the overhead of this system may not provide significant benefits.

large-language-model-training deep-learning-research distributed-training high-performance-computing natural-language-processing

No License No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 7 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

ZHZisZZ/dllm

dLLM: Simple Diffusion Language Modeling

pengzhangzhi/Open-dLLM

Open diffusion language model for code generation — releasing pretraining, evaluation,...

EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. ACM...

THUDM/LongWriter

[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

AIoT-MLSys-Lab/SVD-LLM

[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2

Explore Transformer Models

All categories Trending Transformer directory Insights