Efficient-ML/Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

55
/ 100
Established

This collection helps AI researchers and practitioners find the latest academic papers, benchmark results, and code implementations related to model quantization. It brings together cutting-edge research, from survey papers to specific quantization techniques for large language models, providing a comprehensive overview of the field. Anyone working on making AI models smaller, faster, and more efficient for deployment will find this resource valuable.

2,333 stars. Actively maintained with 19 commits in the last 30 days.

Use this if you are researching model quantization, need to understand the current state-of-the-art, or are looking for specific techniques and their corresponding code for model compression.

Not ideal if you are an end-user simply looking to apply a pre-packaged, off-the-shelf quantized model without delving into the underlying research or code.

AI model compression Deep learning optimization Neural network efficiency Large language model deployment Machine learning research
No License No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 20 / 25

How are scores calculated?

Stars

2,333

Forks

232

Language

License

Last pushed

Jan 29, 2026

Commits (30d)

19

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Efficient-ML/Awesome-Model-Quantization"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.