huawei-csl/SINQ

Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.

60
/ 100
Established

This project helps machine learning engineers and MLOps professionals deploy large language models (LLMs) more efficiently. It takes an existing LLM and reduces its memory footprint without sacrificing accuracy, allowing you to run very large models on GPUs with limited memory. The output is a smaller, high-performing LLM ready for inference.

602 stars. Available on PyPI.

Use this if you need to run large language models on GPUs with limited memory or want to significantly speed up the quantization process for deployment.

Not ideal if you are working with smaller models where memory is not a constraint or if you require end-to-end training during the quantization process.

LLM deployment model optimization GPU resource management machine learning inference MLOps
Maintenance 10 / 25
Adoption 10 / 25
Maturity 24 / 25
Community 16 / 25

How are scores calculated?

Stars

602

Forks

50

Language

Python

License

Apache-2.0

Last pushed

Feb 23, 2026

Commits (30d)

0

Dependencies

14

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/huawei-csl/SINQ"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.