zakariaf/RAG-Cache
High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).
This project helps reduce the cost and improve the speed of applications that use large language models like OpenAI or Anthropic. It takes your application's questions as input and, if a similar question has been asked before, returns a saved answer almost instantly. This is designed for developers building LLM-powered applications who want to optimize performance and control API expenses.
Use this if you are building an application that repeatedly queries large language models and want to save on API costs and significantly reduce response times.
Not ideal if your application primarily asks unique, never-before-seen questions where caching would offer minimal benefit.
Stars
11
Forks
4
Language
Python
License
—
Category
Last pushed
Dec 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/zakariaf/RAG-Cache"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
RediSearch/RediSearch
A query and indexing engine for Redis, providing secondary indexing, full-text search, vector...
redis/redis-vl-python
Redis Vector Library (RedisVL) -- the AI-native Python client for Redis.
redis-developer/redis-ai-resources
✨ A curated list of awesome community resources, integrations, and examples of Redis in the AI ecosystem.
redis-developer/redis-product-search
Visual and semantic vector similarity with Redis Stack, FastAPI, PyTorch and Huggingface.
luyug/GradCache
Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint