DRSY/EasyKV

Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

/ 100

Experimental

This project helps large language models (LLMs) like LLaMa, LLaMa2, and Mistral use less memory while generating text. It allows you to control the memory used by the 'key-value cache,' which is crucial for efficient text generation. The primary user would be someone deploying or managing LLMs, looking to optimize their performance and reduce hardware requirements for tasks like summarization, instruction following, or information retrieval.

No commits in the last 6 months.

Use this if you are working with large language models and need to reduce their memory footprint during text generation or when processing very long inputs.

Not ideal if you are not working with LLMs or if memory efficiency is not a primary concern for your text generation tasks.

LLM deployment NLP model optimization text generation computational linguistics AI infrastructure

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

LMCache/LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

dataflowr/llm_efficiency

KV Cache & LoRA for minGPT

OnlyTerp/kvtc

First open-source KVTC implementation (NVIDIA, ICLR 2026) -- 8-32x KV cache compression via PCA...

itsnamgyu/block-transformer

Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

Explore Transformer Models

All categories Trending Transformer directory Insights