locuslab/wanda
A simple and effective LLM pruning approach.
This project helps machine learning engineers and researchers make large language models (LLMs) more efficient by reducing their size without significantly losing performance. It takes a pre-trained LLM and a desired sparsity level as input, then outputs a smaller, pruned version of the model that's faster and uses less memory. This is ideal for those working on deploying LLMs to resource-constrained environments.
854 stars. No commits in the last 6 months.
Use this if you need to reduce the size and computational demands of large language models like LLaMA or OPT for deployment or research.
Not ideal if you are looking for methods to train LLMs from scratch or fine-tune them for specific tasks without focusing on model compression.
Stars
854
Forks
124
Language
Python
License
MIT
Category
Last pushed
Aug 09, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/locuslab/wanda"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelTC/LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs,...
p-e-w/heretic
Fully automatic censorship removal for language models
Orion-zhen/abliteration
Make abliterated models with transformers, easy and fast
YerbaPage/LongCodeZip
LongCodeZip: Compress Long Context for Code Language Models [ASE2025]
tommasomncttn/mergenetic
Flexible library for merging large language models (LLMs) via evolutionary optimization (ACL 2025 Demo).