dipampaul17/KVSplit

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

37
/ 100
Emerging

This project helps developers working with large language models (LLMs) on Apple Silicon Macs. It allows you to run bigger models with much longer text inputs by drastically reducing the memory needed for the LLM's working memory (KV cache). You input an LLM model file and get the ability to process longer documents or run larger models without running out of memory, often with improved speed.

362 stars. No commits in the last 6 months.

Use this if you are a developer building or running LLMs on an Apple Silicon Mac and are hitting memory limits when dealing with long contexts or larger models.

Not ideal if you are not a developer, or if you are running LLMs on hardware other than Apple Silicon.

LLM development Apple Silicon optimization AI inference memory optimization machine learning engineering
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 10 / 25

How are scores calculated?

Stars

362

Forks

13

Language

Python

License

Last pushed

May 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/dipampaul17/KVSplit"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.