visresearch/SDMPrune

The official implementation of "SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models"

23
/ 100
Experimental

This project helps machine learning engineers and researchers optimize large language models (LLMs) for efficiency. It takes an existing LLM, like LLaMA3.2-1B, and significantly reduces its size by pruning less critical parts, specifically the MLP modules. The output is a more compact LLM that retains strong performance on various natural language understanding tasks, making it suitable for deployment in resource-constrained environments.

No commits in the last 6 months.

Use this if you need to deploy large language models more efficiently on devices with limited memory or computational power, while maintaining high performance on tasks like question answering or commonsense reasoning.

Not ideal if your primary goal is to enhance the model's performance beyond its original capabilities rather than optimizing its size and efficiency.

LLM deployment model compression edge AI NLP efficiency resource optimization
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 6 / 25
Maturity 15 / 25
Community 0 / 25

How are scores calculated?

Stars

21

Forks

Language

Python

License

Last pushed

Jun 11, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/visresearch/SDMPrune"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.