codefuse-ai/ModelCache

A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.

43
/ 100
Emerging

This system helps improve the speed and responsiveness of applications that use large language models (LLMs), such as AI chatbots or content generation tools. It works by storing common questions and their answers, so if the same or a very similar question is asked again, the system can instantly provide the cached answer instead of waiting for the LLM to generate a new one. This tool is for developers and operations teams managing LLM-powered services.

955 stars. No commits in the last 6 months.

Use this if you are running an LLM-powered application and want to reduce response times and potentially lower inference costs by reusing past LLM outputs for similar user queries.

Not ideal if your application primarily handles unique, never-repeating LLM queries where caching would offer minimal benefit.

LLM operations AI application development system performance API management semantic search
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

955

Forks

58

Language

Python

License

Last pushed

Jun 30, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/codefuse-ai/ModelCache"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.