locuslab/wanda

A simple and effective LLM pruning approach.

/ 100

Emerging

This project helps machine learning engineers and researchers make large language models (LLMs) more efficient by reducing their size without significantly losing performance. It takes a pre-trained LLM and a desired sparsity level as input, then outputs a smaller, pruned version of the model that's faster and uses less memory. This is ideal for those working on deploying LLMs to resource-constrained environments.

854 stars. No commits in the last 6 months.

Use this if you need to reduce the size and computational demands of large language models like LLaMA or OPT for deployment or research.

Not ideal if you are looking for methods to train LLMs from scratch or fine-tune them for specific tasks without focusing on model compression.

large-language-models model-compression deep-learning-deployment natural-language-processing machine-learning-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

854

Forks

124

Language

Python

License

MIT

Higher-rated alternatives

ModelTC/LightCompress

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs,...

p-e-w/heretic

Fully automatic censorship removal for language models

Orion-zhen/abliteration

Make abliterated models with transformers, easy and fast

YerbaPage/LongCodeZip

LongCodeZip: Compress Long Context for Code Language Models [ASE2025]

tommasomncttn/mergenetic

Flexible library for merging large language models (LLMs) via evolutionary optimization (ACL 2025 Demo).

Explore Transformer Models

All categories Trending Transformer directory Insights