zilliztech/GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

56
/ 100
Established

This tool helps developers and architects manage the costs and improve the speed of applications that use large language models (LLMs) like ChatGPT. By caching previous LLM queries and their responses, it allows you to reuse answers for identical or similar questions. This is ideal for anyone building applications that frequently interact with LLMs.

7,963 stars. No commits in the last 6 months. Available on PyPI.

Use this if you are developing an application that repeatedly asks similar questions to LLMs and want to reduce API costs and improve response times.

Not ideal if your application primarily uses LLMs for unique, never-before-seen queries where caching offers no benefit.

LLM-application-development API-cost-optimization application-performance AI-integration backend-engineering
Stale 6m
Maintenance 2 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 19 / 25

How are scores calculated?

Stars

7,963

Forks

570

Language

Python

License

MIT

Last pushed

Jul 11, 2025

Commits (30d)

0

Dependencies

3

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/zilliztech/GPTCache"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.