OpenGVLab/OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

49
/ 100
Emerging

This project helps machine learning practitioners compress large language models (LLMs) like LLaMA and Falcon. It takes an existing, large LLM and outputs a smaller, quantized version that uses less memory and can run on less powerful hardware, including mobile phones. Data scientists and AI/ML engineers who work with LLMs and need to deploy them efficiently will find this useful.

890 stars.

Use this if you need to reduce the memory footprint and enable more efficient deployment of large language models on resource-constrained devices or with limited GPU memory.

Not ideal if you are working with smaller, non-LLM models or if memory footprint is not a critical concern for your deployment.

large-language-models model-compression edge-ai model-deployment resource-optimization
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

890

Forks

76

Language

Python

License

MIT

Last pushed

Nov 26, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OpenGVLab/OmniQuant"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.