LINs-lab/DeFT
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
DeFT helps AI developers and researchers make large language models (LLMs) run faster and more efficiently when dealing with complex, multi-step tasks. It takes tree-structured input prompts, common in scenarios like multi-step reasoning or speculative decoding, and processes them with optimized memory usage on GPUs. The output is significantly reduced processing time and improved GPU utilization for these advanced LLM applications.
No commits in the last 6 months.
Use this if you are a developer or researcher building and deploying LLM applications that involve complex, tree-structured interactions and you need to optimize their speed and hardware efficiency.
Not ideal if you are a general LLM user or a developer working with simple, sequential prompts, as the benefits are specific to tree-structured inference.
Stars
50
Forks
2
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jun 17, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/LINs-lab/DeFT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Tencent/AngelSlim
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
nebuly-ai/optimate
A collection of libraries to optimise AI model performances
antgroup/glake
GLake: optimizing GPU memory management and IO transmission.
kyo-takano/chinchilla
A toolkit for scaling law research ⚖
liyucheng09/Selective_Context
Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40%...