Llm Quantization Techniques Transformer Models

There are 22 llm quantization techniques models tracked. 2 score above 70 (verified tier). The highest-rated is bitsandbytes-foundation/bitsandbytes at 77/100 with 8,033 stars. 2 of the top 10 are actively maintained.

Get all 22 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-quantization-techniques&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	bitsandbytes-foundation/bitsandbytes Accessible large language models via k-bit quantization for PyTorch.	77	Verified	8,033	Python
2	intel/neural-compressor SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity;...	74	Verified	2,597	Python
3	dropbox/hqq Official implementation of Half-Quadratic Quantization (HQQ)	54	Established	917	Python
4	OpenGVLab/OmniQuant [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization...	49	Emerging	890	Python
5	Hsu1023/DuQuant [NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation...	40	Emerging	180	Python
6	VITA-Group/Q-GaLore Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank...	40	Emerging	203	Python
7	Aaronhuang-778/BiLLM [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs	39	Emerging	228	Python
8	taishan1994/LLM-Quantization 记录量化LLM中的总结。	34	Emerging	63	Python
9	GURPREETKAURJETHRA/LLaMA3-Quantization LLaMA3-Quantization	33	Emerging	3	Python
10	actypedef/ARCQuant Code for the paper "ARCQuant: Boosting NVFP4 Quantization with Augmented...	33	Emerging	18	Cuda
11	upunaprosk/quantized-lm-confidence Code for NAACL paper When Quantization Affects Confidence of Large Language Models?	31	Emerging	3	Jupyter Notebook
12	snu-mllab/GuidedQuant Official PyTorch implementation of "GuidedQuant: Large Language Model...	30	Emerging	50	Python
13	IST-DASLab/Quartet-II Quartet II Official Code	30	Emerging	53	Python
14	xvyaward/owq Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization...	29	Experimental	69	Python
15	amajji/LLM-Quantization-Techniques-Absmax-Zeropoint-GPTQ-GGUF LLM quantization techniques: absmax, zero-point, GPTQ and GGUF	22	Experimental	2	Jupyter Notebook
16	cnygaard/glq E8 lattice codebook quantization for LLM weights — 2/3/4 bpw with fused...	22	Experimental	—	Python
17	NoakLiu/LLMEasyQuant A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]	21	Experimental	26	Python
18	elphinkuo/llamaqt.c Clean C language version of quantizing llama2 model and running quantized...	20	Experimental	5	C
19	LessUp/llm-speed CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core...	19	Experimental	—	Python
20	akhilchibber/Llama2-Quantization Quantization of the Llama 2 model	17	Experimental	1	Jupyter Notebook
21	kevbuh/bitnet pure pytorch implementation of Microsoft's BitNet b1.58 2B4T	16	Experimental	24	Python
22	actypedef/AURA AURA: Augmented Representation for Unified Accuracy-aware Quantization	13	Experimental	8	Cuda