LLM-Shearing and LLaMA-Pruning
These are competitors—both implement structured pruning approaches to reduce LLaMA model size and latency, with LLM-Shearing being the more established academic solution (ICLR 2024 publication, 10x more stars) while LLaMA-Pruning offers an alternative implementation of similar structural pruning techniques.
About LLM-Shearing
princeton-nlp/LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
This project provides methods and pre-trained models for efficiently creating smaller, specialized large language models (LLMs). By 'shearing' or pruning an existing large model, you can significantly reduce its size and the computational resources needed for pre-training. It's ideal for AI/ML researchers and practitioners who want to develop cost-effective, high-performing small LLMs from larger base models.
About LLaMA-Pruning
horseee/LLaMA-Pruning
Structural Pruning for LLaMA
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work