dipampaul17/KVSplit

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

/ 100

Emerging

This project helps developers working with large language models (LLMs) on Apple Silicon Macs. It allows you to run bigger models with much longer text inputs by drastically reducing the memory needed for the LLM's working memory (KV cache). You input an LLM model file and get the ability to process longer documents or run larger models without running out of memory, often with improved speed.

362 stars. No commits in the last 6 months.

Use this if you are a developer building or running LLMs on an Apple Silicon Mac and are hitting memory limits when dealing with long contexts or larger models.

Not ideal if you are not a developer, or if you are running LLMs on hardware other than Apple Silicon.

LLM development Apple Silicon optimization AI inference memory optimization machine learning engineering

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 10 / 25

How are scores calculated?

Stars

362

Forks

Language

Python

License

—

Higher-rated alternatives

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

reloadware/reloadium

Hot Reloading and Profiling for Python

alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...

October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

Explore LLM Tools

All categories Trending LLM Tool directory Insights