taishan1994/LLM-Quantization
记录量化LLM中的总结。
This is a collection of summaries and practical examples about quantizing Large Language Models (LLMs). It provides insights into techniques for reducing the computational resources and memory needed to run these models, without significantly impacting performance. Researchers and machine learning engineers working with LLMs will find valuable information here.
Use this if you are a machine learning engineer or researcher looking to optimize the deployment and inference of large language models by reducing their size and computational demands.
Not ideal if you are looking for an off-the-shelf software tool for immediate use, as this project is a compilation of research and practical notes, not a direct implementation.
Stars
63
Forks
8
Language
Python
License
—
Category
Last pushed
Jan 08, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/taishan1994/LLM-Quantization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model...
dropbox/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Hsu1023/DuQuant
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger...