nareshis21/Truelarge-RT
Android inference engine running 20B+ parameter LLMs on 4GB-8GB RAM devices. Features proprietary Layer-by-Layer (LBL) streaming, zero-copy mmap loading, and native C++/Kotlin architecture.
This project helps Android app developers enable very large language models (LLMs) to run directly on consumer smartphones and tablets, even older devices with limited RAM. It takes a pre-trained LLM (like Llama 3) as input and allows the app to perform real-time text generation on the device, providing interactive AI capabilities without needing a constant internet connection. Mobile app developers who want to integrate powerful AI features into their Android applications will use this.
Use this if you are developing an Android application and need to run large language models locally on user devices, especially those with 4GB-8GB of RAM, without requiring the entire model to fit into memory.
Not ideal if your application runs on server-side infrastructure or if your target devices consistently have 12GB+ RAM where smaller models can run fully in memory for maximum speed.
Stars
9
Forks
1
Language
Kotlin
License
MIT
Category
Last pushed
Feb 21, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/nareshis21/Truelarge-RT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...