codefuse-ai/ModelCache
A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
This system helps improve the speed and responsiveness of applications that use large language models (LLMs), such as AI chatbots or content generation tools. It works by storing common questions and their answers, so if the same or a very similar question is asked again, the system can instantly provide the cached answer instead of waiting for the LLM to generate a new one. This tool is for developers and operations teams managing LLM-powered services.
955 stars. No commits in the last 6 months.
Use this if you are running an LLM-powered application and want to reduce response times and potentially lower inference costs by reusing past LLM outputs for similar user queries.
Not ideal if your application primarily handles unique, never-repeating LLM queries where caching would offer minimal benefit.
Stars
955
Forks
58
Language
Python
License
—
Category
Last pushed
Jun 30, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/codefuse-ai/ModelCache"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.