jy-yuan/KIVI

[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

49
/ 100
Emerging

This project helps large language model (LLM) developers and researchers deploy their models more efficiently. It takes existing LLMs, like Llama-2 or Mistral, and optimizes their internal memory usage. The output is an LLM that runs faster, handles larger batches of requests, and uses significantly less memory, all without needing extensive fine-tuning.

359 stars.

Use this if you are a machine learning engineer or researcher looking to improve the inference speed and memory footprint of your LLMs, especially when working with models like Llama, Falcon, or Mistral.

Not ideal if you are an end-user of an LLM and do not directly manage model deployment or infrastructure.

LLM deployment model inference optimization deep learning engineering AI infrastructure large language models
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

359

Forks

44

Language

Python

License

MIT

Last pushed

Nov 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jy-yuan/KIVI"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.