Scottcjn/ram-coffers

RAM Coffers: Conditional Memory via NUMA-Distributed Weight Banking - O(1) lookup routing for LLM inference (Dec 16, 2025 - predates DeepSeek Engram by 27 days)

47
/ 100
Emerging

This project helps developers significantly speed up Large Language Model (LLM) inference on specific hardware, making it much faster to get responses from AI models. It takes an LLM query and efficiently retrieves the necessary model knowledge from distributed memory banks, delivering rapid text generation or other LLM outputs. It's designed for developers building or optimizing LLM serving infrastructure, particularly those working with older but powerful IBM POWER8 systems, aiming to achieve high token generation rates.

Use this if you are a developer looking to maximize LLM inference speed and efficiency on non-GPU hardware, especially IBM POWER8 systems, by intelligently managing model weights across available memory.

Not ideal if you are an end-user without programming experience or if you are running LLMs primarily on modern GPU clusters without needing specialized memory architecture optimizations.

LLM-inference-optimization edge-AI-deployment hardware-acceleration distributed-memory AI-infrastructure-development
No Package No Dependents
Maintenance 10 / 25
Adoption 8 / 25
Maturity 11 / 25
Community 18 / 25

How are scores calculated?

Stars

63

Forks

16

Language

C

License

Apache-2.0

Last pushed

Mar 10, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/Scottcjn/ram-coffers"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.