SqueezeAILab/SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

41
/ 100
Emerging

This project helps machine learning engineers and MLOps specialists deploy large language models (LLMs) more efficiently. It takes existing LLM weights (like LLaMA, Vicuna, or Mistral) and processes them to produce smaller, optimized model weights. The result is an LLM that requires significantly less memory to run, while often maintaining or even improving its accuracy and speed.

713 stars. No commits in the last 6 months.

Use this if you are struggling to deploy large language models due to high memory requirements on your GPU infrastructure, but want to maintain or improve model performance.

Not ideal if you are working with small models that don't have significant memory footprint issues or if you don't require the absolute best performance metrics for your LLM.

LLM deployment model optimization GPU efficiency AI infrastructure machine learning operations
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

713

Forks

49

Language

Python

License

MIT

Last pushed

Aug 13, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/SqueezeAILab/SqueezeLLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.