BaiTheBest/SparseLLM
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
This project helps machine learning researchers and engineers make Large Language Models (LLMs) like OPT and LLaMA-2 smaller and faster. By reducing the number of connections in these models, it helps them run more efficiently on hardware. You input an existing LLM and a desired sparsity level, and it outputs a more compact, pruned version of that model.
No commits in the last 6 months.
Use this if you are a machine learning researcher or engineer looking to optimize the computational efficiency and memory footprint of large language models for deployment or experimentation.
Not ideal if you need a quick, one-shot pruning solution or if you are working with very limited GPU memory on models larger than LLaMA-2-7B without adjusting calibration data size.
Stars
67
Forks
10
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/BaiTheBest/SparseLLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelTC/LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs,...
p-e-w/heretic
Fully automatic censorship removal for language models
Orion-zhen/abliteration
Make abliterated models with transformers, easy and fast
YerbaPage/LongCodeZip
LongCodeZip: Compress Long Context for Code Language Models [ASE2025]
locuslab/wanda
A simple and effective LLM pruning approach.