dropbox/hqq

Official implementation of Half-Quadratic Quantization (HQQ)

/ 100

Established

This project helps machine learning practitioners reduce the memory footprint and speed up large AI models, such as Large Language Models (LLMs) or computer vision models. It takes an existing, large AI model and converts its internal numerical weights into a smaller, more efficient format without needing extra data for calibration. The output is a functionally similar, but more compact and faster-running version of your original AI model, ready for deployment or further training.

917 stars.

Use this if you need to make very large AI models run faster and consume significantly less memory on your hardware, without sacrificing too much accuracy or requiring a complex calibration process.

Not ideal if your primary goal is maximum model accuracy at any computational cost, or if you require very fine-grained control over the quantization process for highly specialized hardware.

AI model deployment Large Language Models computer vision models model optimization deep learning inference

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

917

Forks

Language

Python

License

Apache-2.0

Related models

bitsandbytes-foundation/bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model...

OpenGVLab/OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Hsu1023/DuQuant

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger...

VITA-Group/Q-GaLore

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.

Explore Transformer Models

All categories Trending Transformer directory Insights