amazon-science/llm-rank-pruning
LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.
This tool helps AI researchers and machine learning engineers reduce the size and computational cost of large language models (LLMs) without significantly losing performance. It takes a pre-trained LLM and calibration data, then identifies and removes less important parts of the model. The output is a smaller, more efficient pruned LLM.
No commits in the last 6 months.
Use this if you need to deploy large language models on resource-constrained devices or reduce their operational cost while maintaining strong performance.
Not ideal if you are looking for a tool to train LLMs from scratch or fine-tune them on new data, as this focuses solely on post-training model compression.
Stars
8
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Nov 29, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/amazon-science/llm-rank-pruning"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Tencent/AngelSlim
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
nebuly-ai/optimate
A collection of libraries to optimise AI model performances
antgroup/glake
GLake: optimizing GPU memory management and IO transmission.
kyo-takano/chinchilla
A toolkit for scaling law research ⚖
liyucheng09/Selective_Context
Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40%...