ModelTC/QLLM

[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

34
/ 100
Emerging

QLLM helps machine learning engineers and researchers make large language models (LLMs) run more efficiently without losing accuracy. It takes an existing, unquantized LLM and converts it into a smaller, faster version, making it more practical for deployment and use on less powerful hardware. This tool is for those who deploy or research state-of-the-art language models.

No commits in the last 6 months.

Use this if you need to reduce the computational resources (memory, processing power) required to run large language models while maintaining their performance.

Not ideal if you are a casual user of LLMs or don't have experience with model optimization and deployment.

large-language-models model-optimization deep-learning-deployment ai-efficiency machine-learning-research
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

39

Forks

5

Language

Python

License

Apache-2.0

Last pushed

Mar 11, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ModelTC/QLLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.