codefuse-ai/ModelCache

A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.

/ 100

Emerging

This system helps improve the speed and responsiveness of applications that use large language models (LLMs), such as AI chatbots or content generation tools. It works by storing common questions and their answers, so if the same or a very similar question is asked again, the system can instantly provide the cached answer instead of waiting for the LLM to generate a new one. This tool is for developers and operations teams managing LLM-powered services.

955 stars. No commits in the last 6 months.

Use this if you are running an LLM-powered application and want to reduce response times and potentially lower inference costs by reusing past LLM outputs for similar user queries.

Not ideal if your application primarily handles unique, never-repeating LLM queries where caching would offer minimal benefit.

LLM operations AI application development system performance API management semantic search

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

955

Forks

Language

Python

License

—

Higher-rated alternatives

Zefan-Cai/R-KV

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

snu-mllab/KVzip

[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in...

philtimmes/KeSSie

KeSSie HUGE Context Semantic recall for Large Language Models

Explore Embedding Tools

All categories Trending Embeddings directory Insights