Scottcjn/ram-coffers

RAM Coffers: Conditional Memory via NUMA-Distributed Weight Banking - O(1) lookup routing for LLM inference (Dec 16, 2025 - predates DeepSeek Engram by 27 days)

/ 100

Emerging

This project helps developers significantly speed up Large Language Model (LLM) inference on specific hardware, making it much faster to get responses from AI models. It takes an LLM query and efficiently retrieves the necessary model knowledge from distributed memory banks, delivering rapid text generation or other LLM outputs. It's designed for developers building or optimizing LLM serving infrastructure, particularly those working with older but powerful IBM POWER8 systems, aiming to achieve high token generation rates.

Use this if you are a developer looking to maximize LLM inference speed and efficiency on non-GPU hardware, especially IBM POWER8 systems, by intelligently managing model weights across available memory.

Not ideal if you are an end-user without programming experience or if you are running LLMs primarily on modern GPU clusters without needing specialized memory architecture optimizations.

LLM-inference-optimization edge-AI-deployment hardware-acceleration distributed-memory AI-infrastructure-development

No Package No Dependents

Maintenance 10 / 25

Adoption 8 / 25

Maturity 11 / 25

Community 18 / 25

How are scores calculated?

Stars

Forks

Language

License

Apache-2.0

Higher-rated alternatives

rapidsai/cuvs

cuVS - a library for vector search and clustering on the GPU

Explore Vector Databases

All categories Trending Vector Database directory Insights