MSNP1381/cache-cool

🌟 Cache-cool: A fast, flexible LLM caching proxy that reduces latency and API costs by caching repetitive calls to LLM services. 🔄 Supports dynamic configurations, 📚 multiple backends (🟥 Redis, 🟢 MongoDB, 📁 JSON), and 🏗️ schema-specific settings.

/ 100

Experimental

Cache-Cool helps developers working with Large Language Models (LLMs) by reducing the cost and improving the speed of their applications. It works by storing previous LLM responses, so if the same question or prompt is sent again, it can instantly provide the answer without contacting the LLM service. This is ideal for developers building applications that frequently interact with services like OpenAI or Claude.

No commits in the last 6 months.

Use this if you are a developer building an application that makes frequent, repetitive calls to LLM services and you want to reduce API costs and improve response times.

Not ideal if your application primarily involves unique, non-repetitive LLM queries, or if you are not comfortable setting up and managing a caching proxy with Docker or Python dependencies.

LLM application development API cost optimization application performance software engineering backend services

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

ModelEngine-Group/unified-cache-management

Persist and reuse KV Cache to speedup your LLM.

reloadware/reloadium

Hot Reloading and Profiling for Python

alibaba/tair-kvcache

Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...

October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

Zefan-Cai/Awesome-LLM-KV-Cache

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

Explore LLM Tools

All categories Trending LLM Tool directory Insights