LINs-lab/DeFT

[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

31
/ 100
Emerging

DeFT helps AI developers and researchers make large language models (LLMs) run faster and more efficiently when dealing with complex, multi-step tasks. It takes tree-structured input prompts, common in scenarios like multi-step reasoning or speculative decoding, and processes them with optimized memory usage on GPUs. The output is significantly reduced processing time and improved GPU utilization for these advanced LLM applications.

No commits in the last 6 months.

Use this if you are a developer or researcher building and deploying LLM applications that involve complex, tree-structured interactions and you need to optimize their speed and hardware efficiency.

Not ideal if you are a general LLM user or a developer working with simple, sequential prompts, as the benefits are specific to tree-structured inference.

LLM-inference GPU-optimization model-deployment AI-research reasoning-tasks
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

50

Forks

2

Language

Jupyter Notebook

License

MIT

Last pushed

Jun 17, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/LINs-lab/DeFT"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.