Scottcjn/ram-coffers
RAM Coffers: Conditional Memory via NUMA-Distributed Weight Banking - O(1) lookup routing for LLM inference (Dec 16, 2025 - predates DeepSeek Engram by 27 days)
This project helps developers significantly speed up Large Language Model (LLM) inference on specific hardware, making it much faster to get responses from AI models. It takes an LLM query and efficiently retrieves the necessary model knowledge from distributed memory banks, delivering rapid text generation or other LLM outputs. It's designed for developers building or optimizing LLM serving infrastructure, particularly those working with older but powerful IBM POWER8 systems, aiming to achieve high token generation rates.
Use this if you are a developer looking to maximize LLM inference speed and efficiency on non-GPU hardware, especially IBM POWER8 systems, by intelligently managing model weights across available memory.
Not ideal if you are an end-user without programming experience or if you are running LLMs primarily on modern GPU clusters without needing specialized memory architecture optimizations.
Stars
63
Forks
16
Language
C
License
Apache-2.0
Category
Last pushed
Mar 10, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/Scottcjn/ram-coffers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.