MSNP1381/cache-cool
π Cache-cool: A fast, flexible LLM caching proxy that reduces latency and API costs by caching repetitive calls to LLM services. π Supports dynamic configurations, π multiple backends (π₯ Redis, π’ MongoDB, π JSON), and ποΈ schema-specific settings.
Cache-Cool helps developers working with Large Language Models (LLMs) by reducing the cost and improving the speed of their applications. It works by storing previous LLM responses, so if the same question or prompt is sent again, it can instantly provide the answer without contacting the LLM service. This is ideal for developers building applications that frequently interact with services like OpenAI or Claude.
No commits in the last 6 months.
Use this if you are a developer building an application that makes frequent, repetitive calls to LLM services and you want to reduce API costs and improve response times.
Not ideal if your application primarily involves unique, non-repetitive LLM queries, or if you are not comfortable setting up and managing a caching proxy with Docker or Python dependencies.
Stars
29
Forks
2
Language
Python
License
—
Category
Last pushed
Aug 17, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/MSNP1381/cache-cool"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelEngine-Group/unified-cache-management
Persist and reuse KV Cache to speedup your LLM.
reloadware/reloadium
Hot Reloading and Profiling for Python
alibaba/tair-kvcache
Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global...
October2001/Awesome-KV-Cache-Compression
π° Must-read papers on KV Cache Compression (constantly updating π€).
Zefan-Cai/Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of πAwesome LLM KV Cache Papers with Codes.