taishan1994/LLM-Quantization

记录量化LLM中的总结。

34
/ 100
Emerging

This is a collection of summaries and practical examples about quantizing Large Language Models (LLMs). It provides insights into techniques for reducing the computational resources and memory needed to run these models, without significantly impacting performance. Researchers and machine learning engineers working with LLMs will find valuable information here.

Use this if you are a machine learning engineer or researcher looking to optimize the deployment and inference of large language models by reducing their size and computational demands.

Not ideal if you are looking for an off-the-shelf software tool for immediate use, as this project is a compilation of research and practical notes, not a direct implementation.

LLM deployment model optimization deep learning inference computational efficiency AI model compression
No License No Package No Dependents
Maintenance 6 / 25
Adoption 8 / 25
Maturity 7 / 25
Community 13 / 25

How are scores calculated?

Stars

63

Forks

8

Language

Python

License

Last pushed

Jan 08, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/taishan1994/LLM-Quantization"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.