elephantmipt/compressors
A small library with distillation, quantization and pruning pipelines
For machine learning practitioners, this helps optimize large deep learning models for deployment. It takes an existing, powerful model (the 'teacher') and efficiently transfers its knowledge to a smaller, faster model (the 'student'). This results in a compressed model that performs nearly as well as the original but requires fewer computational resources, making it ideal for real-time applications or environments with limited processing power.
No commits in the last 6 months.
Use this if you need to reduce the size and computational cost of your trained deep learning models in computer vision or natural language processing without significantly sacrificing performance.
Not ideal if you are working with non-deep learning models, or if your primary goal is to train models from scratch rather than optimize existing ones.
Stars
26
Forks
3
Language
Python
License
MIT
Category
Last pushed
Apr 20, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/elephantmipt/compressors"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
pytorch/ao
PyTorch native quantization and sparsity for training and inference
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
NVIDIA/kvpress
LLM KV cache compression made easy