intel/auto-round

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantization, MXFP4, NVFP4, GGUF, and adaptive schemes.

75
/ 100
Verified

This tool helps AI engineers and machine learning practitioners reduce the computational resources needed to run large language models (LLMs) and vision-language models (VLMs) without significantly sacrificing their performance. You feed it your large AI model, and it outputs a smaller, highly optimized version that runs faster and with less memory. It's designed for those who deploy and manage AI models in production.

883 stars. Actively maintained with 85 commits in the last 30 days. Available on PyPI.

Use this if you need to deploy large AI models more efficiently, reducing their size and speeding up inference while maintaining high accuracy, especially on diverse hardware.

Not ideal if you are a data scientist primarily focused on model training and experimentation, as this tool is geared towards post-training optimization for deployment.

AI model deployment Large Language Models (LLMs) Vision-Language Models (VLMs) AI inference optimization Machine learning engineering
Maintenance 22 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 18 / 25

How are scores calculated?

Stars

883

Forks

81

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

85

Dependencies

8

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/intel/auto-round"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.