LLM-Pruner and LLM-Shearing
These are **competitors** — both implement structural pruning to reduce LLM size and latency, but LLM-Pruner offers a general pruning framework applicable to multiple architectures, while LLM-Shearing proposes a specific pre-training-aware pruning approach optimized for LLaMA models.
About LLM-Pruner
horseee/LLM-Pruner
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
This project helps machine learning engineers and researchers reduce the size of large language models (LLMs) like Llama, BLOOM, and Vicuna. By taking an existing LLM as input, it prunes unnecessary components while aiming to maintain its multi-task abilities. The output is a smaller, more efficient LLM that uses less computational resources, allowing for easier deployment and faster inference.
About LLM-Shearing
princeton-nlp/LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
This project provides methods and pre-trained models for efficiently creating smaller, specialized large language models (LLMs). By 'shearing' or pruning an existing large model, you can significantly reduce its size and the computational resources needed for pre-training. It's ideal for AI/ML researchers and practitioners who want to develop cost-effective, high-performing small LLMs from larger base models.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work