Aaronhuang-778/BiLLM

[ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

39
/ 100
Emerging

BiLLM helps machine learning engineers and researchers reduce the memory and computational demands of large language models (LLMs). It takes existing pretrained LLMs (like LLaMA or OPT) and converts their vast numerical weights into a highly compressed 1-bit format. This results in much smaller models that still perform accurately, making them more practical for deployment on devices with limited resources.

228 stars. No commits in the last 6 months.

Use this if you need to significantly shrink the size and computational requirements of large language models while maintaining high accuracy, especially for deployment on resource-constrained hardware.

Not ideal if you are not working with large language models or if you require an extremely high-precision model where even minimal accuracy trade-offs are unacceptable.

large-language-models model-compression edge-ai deep-learning-optimization nlp-deployment
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

228

Forks

17

Language

Python

License

MIT

Last pushed

Jan 11, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Aaronhuang-778/BiLLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.