LLM Compression Optimization LLM Tools

Tools and techniques for reducing LLM size, memory footprint, and inference latency through compression, pruning, quantization, and architectural optimization. Does NOT include general model training, fine-tuning frameworks, or inference serving infrastructure.

There are 17 llm compression optimization tools tracked. 1 score above 70 (verified tier). The highest-rated is Tencent/AngelSlim at 70/100 with 536 stars. 1 of the top 10 are actively maintained.

Get all 17 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-compression-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	Tencent/AngelSlim Model compression toolkit engineered for enhanced usability,...	70	Verified	536	Python
2	nebuly-ai/optimate A collection of libraries to optimise AI model performances	45	Emerging	8,349	Python
3	antgroup/glake GLake: optimizing GPU memory management and IO transmission.	42	Emerging	499	Python
4	kyo-takano/chinchilla A toolkit for scaling law research ⚖	42	Emerging	57	Python
5	liyucheng09/Selective_Context Compress your input to ChatGPT or other LLMs, to let them process 2x more...	40	Emerging	410	Python
6	TsingmaoAI/MI-optimize mi-optimize is a versatile tool designed for the quantization and evaluation...	38	Emerging	25	Python
7	microsoft/only_train_once OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured...	36	Emerging	50	Python
8	amazon-science/llm-rank-pruning LLM-Rank: A graph theoretical approach to structured pruning of large...	34	Emerging	8	Python
9	naskio/mergeui All-in-one UI for merged LLMs in Hugging Face	33	Emerging	25	Python
10	LINs-lab/DeFT [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient...	31	Emerging	50	Jupyter Notebook
11	robtacconelli/Nacrith-GPU Nacrith — Lossless text compression via ensemble neural arithmetic coding....	30	Emerging	17	Python
12	M9rth/heretic 🛠 Remove censorship from language models instantly using advanced...	25	Experimental	1	Python
13	talkking/PrunerGPT [ICASSP2024] One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large...	25	Experimental	6	Python
14	louisbrulenaudet/mergeKit Tools for merging pretrained Large Language Models and create Mixture of...	22	Experimental	8	Jupyter Notebook
15	Yvancg/optimizers A collection of minimal, dependency-free, performance-focused utilities for...	20	Experimental	1	JavaScript
16	louisbrulenaudet/mergekit-assistant Mergekit Assistant is a cutting-edge toolkit designed for the seamless...	17	Experimental	1	—
17	Mikola78/trinity-large-tech-report 🚀 Explore advanced sparse Mixture-of-Experts models with up to 400B...	13	Experimental	—	—