dropbox/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
This project helps machine learning practitioners reduce the memory footprint and speed up large AI models, such as Large Language Models (LLMs) or computer vision models. It takes an existing, large AI model and converts its internal numerical weights into a smaller, more efficient format without needing extra data for calibration. The output is a functionally similar, but more compact and faster-running version of your original AI model, ready for deployment or further training.
917 stars.
Use this if you need to make very large AI models run faster and consume significantly less memory on your hardware, without sacrificing too much accuracy or requiring a complex calibration process.
Not ideal if your primary goal is maximum model accuracy at any computational cost, or if you require very fine-grained control over the quantization process for highly specialized hardware.
Stars
917
Forks
89
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 26, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/dropbox/hqq"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model...
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Hsu1023/DuQuant
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger...
VITA-Group/Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.