LINs-lab/DeFT

[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

/ 100

Emerging

DeFT helps AI developers and researchers make large language models (LLMs) run faster and more efficiently when dealing with complex, multi-step tasks. It takes tree-structured input prompts, common in scenarios like multi-step reasoning or speculative decoding, and processes them with optimized memory usage on GPUs. The output is significantly reduced processing time and improved GPU utilization for these advanced LLM applications.

No commits in the last 6 months.

Use this if you are a developer or researcher building and deploying LLM applications that involve complex, tree-structured interactions and you need to optimize their speed and hardware efficiency.

Not ideal if you are a general LLM user or a developer working with simple, sequential prompts, as the benefits are specific to tree-structured inference.

LLM-inference GPU-optimization model-deployment AI-research reasoning-tasks

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

Tencent/AngelSlim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

nebuly-ai/optimate

A collection of libraries to optimise AI model performances

antgroup/glake

GLake: optimizing GPU memory management and IO transmission.

kyo-takano/chinchilla

A toolkit for scaling law research ⚖

liyucheng09/Selective_Context

Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40%...

Explore LLM Tools

All categories Trending LLM Tool directory Insights