megvii-research/IntLLaMA

IntLLaMA: A fast and light quantization solution for LLaMA

22
/ 100
Experimental

This helps AI engineers and researchers reduce the memory footprint and speed up large language models like LLaMA without losing much performance. It takes a full-precision language model as input and outputs a much smaller, faster, and quantized version. This is for machine learning practitioners who deploy or experiment with large AI models on hardware with limited resources.

No commits in the last 6 months.

Use this if you need to run large language models more efficiently on GPUs with less memory, making them faster and more accessible.

Not ideal if you primarily need a general-purpose fine-tuning library or are working with models other than LLaMA or ChatGLMv2.

large-language-models model-optimization edge-ai ai-deployment resource-constrained-ai
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

18

Forks

Language

Python

License

Apache-2.0

Last pushed

Jul 21, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/megvii-research/IntLLaMA"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.