huawei-csl/SINQ
Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method designed to make any Large Language Model smaller while preserving accuracy.
This project helps machine learning engineers and MLOps professionals deploy large language models (LLMs) more efficiently. It takes an existing LLM and reduces its memory footprint without sacrificing accuracy, allowing you to run very large models on GPUs with limited memory. The output is a smaller, high-performing LLM ready for inference.
602 stars. Available on PyPI.
Use this if you need to run large language models on GPUs with limited memory or want to significantly speed up the quantization process for deployment.
Not ideal if you are working with smaller models where memory is not a constraint or if you require end-to-end training during the quantization process.
Stars
602
Forks
50
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 23, 2026
Commits (30d)
0
Dependencies
14
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/huawei-csl/SINQ"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
SILX-LABS/QUASAR-SUBNET
QUASAR is a long-context foundation model and decentralized evaluation subnet built on Bittensor,
stackblogger/bitnet.js
BitNet.Js - A node.js implementation of the microsoft bitnet.cpp inference framework.
m96-chan/0xBitNet
Run BitNet b1.58 ternary LLMs with WebGPU — in browsers and native apps
AnswerDotAI/cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking...
FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.