LMCache and llm_efficiency

LMCache provides a production-ready KV cache optimization system for any LLM, while llm_efficiency is an educational implementation demonstrating KV cache concepts within a minimal GPT architecture—making them educational reference vs. practical tool rather than true competitors or complements.

LMCache

Verified

llm_efficiency

Emerging

Maintenance 22/25

Adoption 10/25

Maturity 25/25

Community 22/25

Maintenance 10/25

Adoption 8/25

Maturity 11/25

Community 12/25

Stars: 7,664

Forks: 1,009

Downloads: —

Commits (30d): 117

Language: Python

License: Apache-2.0

Stars: 59

Forks: 7

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No risk flags

No Package No Dependents

About LMCache

LMCache/LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

When you're running Large Language Models (LLMs) and notice slow responses or high computing costs, especially with long prompts or repeated questions, LMCache can help. It takes your LLM inputs and stores parts of them so the LLM doesn't have to re-process the same information, significantly speeding up response times and reducing GPU usage. This is ideal for infrastructure engineers, MLOps specialists, or anyone managing LLM deployment who wants to optimize performance and cost.

LLM deployment AI infrastructure model serving performance optimization cloud cost management

About llm_efficiency

dataflowr/llm_efficiency

KV Cache & LoRA for minGPT

This project helps developers working with Large Language Models (LLMs) to make their models run faster and fine-tune more efficiently. It provides implementations of KV Caching to speed up how LLMs generate text, and LoRA (Low-Rank Adaptation) to reduce the cost of adapting pre-trained models to new tasks. If you're building or customizing LLMs, you can use these techniques to optimize performance and resource use.

Large Language Models LLM fine-tuning AI model optimization Machine Learning Engineering Deep Learning performance

Scores updated daily from GitHub, PyPI, and npm data. How scores work