zakariaf/RAG-Cache

High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).

31
/ 100
Emerging

This project helps reduce the cost and improve the speed of applications that use large language models like OpenAI or Anthropic. It takes your application's questions as input and, if a similar question has been asked before, returns a saved answer almost instantly. This is designed for developers building LLM-powered applications who want to optimize performance and control API expenses.

Use this if you are building an application that repeatedly queries large language models and want to save on API costs and significantly reduce response times.

Not ideal if your application primarily asks unique, never-before-seen questions where caching would offer minimal benefit.

LLM application development API cost optimization latency reduction application performance developer tools
No License No Package No Dependents
Maintenance 6 / 25
Adoption 5 / 25
Maturity 5 / 25
Community 15 / 25

How are scores calculated?

Stars

11

Forks

4

Language

Python

License

Last pushed

Dec 02, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/zakariaf/RAG-Cache"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.