Llm Quantization Techniques Transformer Models

There are 22 llm quantization techniques models tracked. 2 score above 70 (verified tier). The highest-rated is bitsandbytes-foundation/bitsandbytes at 77/100 with 8,033 stars. 2 of the top 10 are actively maintained.

Get all 22 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-quantization-techniques&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 bitsandbytes-foundation/bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

77
Verified
2 intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity;...

74
Verified
3 dropbox/hqq

Official implementation of Half-Quadratic Quantization (HQQ)

54
Established
4 OpenGVLab/OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization...

49
Emerging
5 Hsu1023/DuQuant

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation...

40
Emerging
6 VITA-Group/Q-GaLore

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank...

40
Emerging
7 Aaronhuang-778/BiLLM

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

39
Emerging
8 taishan1994/LLM-Quantization

记录量化LLM中的总结。

34
Emerging
9 GURPREETKAURJETHRA/LLaMA3-Quantization

LLaMA3-Quantization

33
Emerging
10 actypedef/ARCQuant

Code for the paper "ARCQuant: Boosting NVFP4 Quantization with Augmented...

33
Emerging
11 upunaprosk/quantized-lm-confidence

Code for NAACL paper When Quantization Affects Confidence of Large Language Models?

31
Emerging
12 snu-mllab/GuidedQuant

Official PyTorch implementation of "GuidedQuant: Large Language Model...

30
Emerging
13 IST-DASLab/Quartet-II

Quartet II Official Code

30
Emerging
14 xvyaward/owq

Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization...

29
Experimental
15 amajji/LLM-Quantization-Techniques-Absmax-Zeropoint-GPTQ-GGUF

LLM quantization techniques: absmax, zero-point, GPTQ and GGUF

22
Experimental
16 cnygaard/glq

E8 lattice codebook quantization for LLM weights — 2/3/4 bpw with fused...

22
Experimental
17 NoakLiu/LLMEasyQuant

A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]

21
Experimental
18 elphinkuo/llamaqt.c

Clean C language version of quantizing llama2 model and running quantized...

20
Experimental
19 LessUp/llm-speed

CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core...

19
Experimental
20 akhilchibber/Llama2-Quantization

Quantization of the Llama 2 model

17
Experimental
21 kevbuh/bitnet

pure pytorch implementation of Microsoft's BitNet b1.58 2B4T

16
Experimental
22 actypedef/AURA

AURA: Augmented Representation for Unified Accuracy-aware Quantization

13
Experimental