zilliztech/GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

/ 100

Established

This tool helps developers and architects manage the costs and improve the speed of applications that use large language models (LLMs) like ChatGPT. By caching previous LLM queries and their responses, it allows you to reuse answers for identical or similar questions. This is ideal for anyone building applications that frequently interact with LLMs.

7,963 stars. No commits in the last 6 months. Available on PyPI.

Use this if you are developing an application that repeatedly asks similar questions to LLMs and want to reduce API costs and improve response times.

Not ideal if your application primarily uses LLMs for unique, never-before-seen queries where caching offers no benefit.

LLM-application-development API-cost-optimization application-performance AI-integration backend-engineering

Stale 6m

Maintenance 2 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

7,963

Forks

570

Language

Python

License

MIT

Featured in

Agent Memory in 2026: What Actually Works for Persistent AI

Related tools

aiming-lab/SimpleMem

SimpleMem: Efficient Lifelong Memory for LLM Agents

zilliztech/memsearch

A Markdown-first memory system, a standalone library for any AI agent. Inspired by OpenClaw.

microsoft/kernel-memory

Research project. A Memory solution for users, teams, and applications.

TeleAI-UAGI/telemem

TeleMem is a high-performance drop-in replacement for Mem0, featuring semantic deduplication,...

RichmondAlake/memorizz

MemoRizz: A Python library serving as a memory layer for AI applications. Leverages popular...

Explore Embedding Tools

All categories Trending Embeddings directory Insights