turboquant and turboquant-torch

These tools are competitors, as both claim to be implementations of the same TurboQuant technique for KV cache compression in LLMs.

turboquant

Emerging

turboquant-torch

Experimental

Maintenance 13/25

Adoption 7/25

Maturity 9/25

Community 6/25

Maintenance 13/25

Adoption 5/25

Maturity 9/25

Community 0/25

Stars: 36

Forks: 2

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

Stars: 9

Forks: —

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No Package No Dependents

About turboquant

OnlyTerp/turboquant

First open-source implementation of Google TurboQuant (ICLR 2026) -- near-optimal KV cache compression for LLM inference. 5x compression with near-zero quality loss.

This project helps you run large language models (LLMs) more efficiently by significantly reducing the memory they need during inference. It takes the model's internal 'KV cache' data and compresses it by up to 7 times while maintaining almost the same quality in the model's responses. Anyone who deploys or manages LLMs and wants to serve more users, handle longer text inputs, or reduce GPU costs would find this valuable.

LLM deployment AI inference optimization GPU memory management large language models model serving

About turboquant-torch

codepawl/turboquant-torch

Unofficial PyTorch implementation of TurboQuant (Google Research, ICLR 2026). Near-optimal vector quantization for KV cache compression and vector search. 3-bit with zero accuracy loss.

This tool helps AI practitioners and researchers dramatically reduce the memory footprint of large language models (LLMs) during inference. It takes your existing PyTorch LLM and compresses its internal memory (KV cache) or vector databases, outputting a model that uses significantly less RAM with virtually no loss in accuracy. This is designed for anyone running LLMs where memory efficiency is critical.

LLM deployment AI inference optimization vector search memory management resource-constrained AI

Scores updated daily from GitHub, PyPI, and npm data. How scores work