Efficient-ML/Awesome-Model-Quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
This collection helps AI researchers and practitioners find the latest academic papers, benchmark results, and code implementations related to model quantization. It brings together cutting-edge research, from survey papers to specific quantization techniques for large language models, providing a comprehensive overview of the field. Anyone working on making AI models smaller, faster, and more efficient for deployment will find this resource valuable.
2,333 stars. Actively maintained with 19 commits in the last 30 days.
Use this if you are researching model quantization, need to understand the current state-of-the-art, or are looking for specific techniques and their corresponding code for model compression.
Not ideal if you are an end-user simply looking to apply a pre-packaged, off-the-shelf quantized model without delving into the underlying research or code.
Stars
2,333
Forks
232
Language
—
License
—
Category
Last pushed
Jan 29, 2026
Commits (30d)
19
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Efficient-ML/Awesome-Model-Quantization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
open-mmlab/mmengine
OpenMMLab Foundational Library for Training Deep Learning Models
Xilinx/brevitas
Brevitas: neural network quantization in PyTorch
google/qkeras
QKeras: a quantization deep learning library for Tensorflow Keras
fastmachinelearning/qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
tensorflow/model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization...