locuslab/wanda

A simple and effective LLM pruning approach.

47
/ 100
Emerging

This project helps machine learning engineers and researchers make large language models (LLMs) more efficient by reducing their size without significantly losing performance. It takes a pre-trained LLM and a desired sparsity level as input, then outputs a smaller, pruned version of the model that's faster and uses less memory. This is ideal for those working on deploying LLMs to resource-constrained environments.

854 stars. No commits in the last 6 months.

Use this if you need to reduce the size and computational demands of large language models like LLaMA or OPT for deployment or research.

Not ideal if you are looking for methods to train LLMs from scratch or fine-tune them for specific tasks without focusing on model compression.

large-language-models model-compression deep-learning-deployment natural-language-processing machine-learning-research
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

854

Forks

124

Language

Python

License

MIT

Last pushed

Aug 09, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/locuslab/wanda"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.