visresearch/SDMPrune

The official implementation of "SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models"

/ 100

Experimental

This project helps machine learning engineers and researchers optimize large language models (LLMs) for efficiency. It takes an existing LLM, like LLaMA3.2-1B, and significantly reduces its size by pruning less critical parts, specifically the MLP modules. The output is a more compact LLM that retains strong performance on various natural language understanding tasks, making it suitable for deployment in resource-constrained environments.

No commits in the last 6 months.

Use this if you need to deploy large language models more efficiently on devices with limited memory or computational power, while maintaining high performance on tasks like question answering or commonsense reasoning.

Not ideal if your primary goal is to enhance the model's performance beyond its original capabilities rather than optimizing its size and efficiency.

LLM deployment model compression edge AI NLP efficiency resource optimization

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 15 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

peremartra/optipfair

Structured pruning and bias visualization for Large Language Models. Tools for LLM optimization...

VainF/Torch-Pruning

[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.

horseee/LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support...

CASIA-LMC-Lab/FLAP

[AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models

princeton-nlp/LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

Explore Transformer Models

All categories Trending Transformer directory Insights