xvyaward/owq

Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models".

29
/ 100
Experimental

This project helps machine learning engineers and researchers make large language models (LLMs) like LLaMA and BLOOM more efficient without losing quality. It takes an existing LLM and quantizes its weights to 3 or 4 bits, while preserving crucial 'outlier' columns at higher precision. The output is a smaller, faster LLM that can be fine-tuned and used for inference with significantly reduced memory and computational requirements.

No commits in the last 6 months.

Use this if you need to run large language models on hardware with limited memory or computational power, or if you want to speed up inference and fine-tuning of LLMs while maintaining high accuracy.

Not ideal if you are working with smaller models that don't benefit as much from aggressive quantization, or if your hardware is not a NVIDIA A100/A6000/RTX3090, as kernel performance may be suboptimal.

large-language-models model-optimization deep-learning-inference fine-tuning resource-efficiency
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

69

Forks

8

Language

Python

License

Last pushed

Mar 07, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/xvyaward/owq"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.