jordddan/Pruning-LLMs
The framework to prune LLMs to any size and any config.
This framework helps machine learning practitioners efficiently reduce the size of large language models (LLMs) without significant loss of capability. It takes a pre-trained Transformer-based LLM and allows you to specify a custom, smaller configuration, outputting a compact model that is faster to run and easier to fine-tune for specific tasks. This is ideal for scientists, engineers, or product managers who need to deploy powerful LLMs with limited computational resources.
No commits in the last 6 months.
Use this if you need to create a smaller, more efficient version of an existing large language model for deployment or specialized fine-tuning, especially when working with limited computing resources.
Not ideal if you are looking for a simple, out-of-the-box solution for general LLM use without requiring custom architectural modifications or re-training.
Stars
95
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 01, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jordddan/Pruning-LLMs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelTC/LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs,...
p-e-w/heretic
Fully automatic censorship removal for language models
Orion-zhen/abliteration
Make abliterated models with transformers, easy and fast
YerbaPage/LongCodeZip
LongCodeZip: Compress Long Context for Code Language Models [ASE2025]
locuslab/wanda
A simple and effective LLM pruning approach.