mechramc/Orion
Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tuning & inference on M-series silicon.
This tool helps AI practitioners train and run small language models (LLMs) directly on their Apple M-series devices, using the dedicated Neural Engine. You can input text prompts for immediate responses or provide a dataset to fine-tune a model. It's designed for machine learning engineers and researchers who want to develop and test LLMs locally without relying on cloud services or external GPUs.
Use this if you need to fine-tune or run small LLMs efficiently and privately on your Apple Silicon device, leveraging its dedicated AI hardware.
Not ideal if you need to work with very large, general-purpose LLMs that require massive computational power, or if you prefer a cloud-based solution.
Stars
31
Forks
2
Language
Objective-C
License
MIT
Category
Last pushed
Mar 06, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mechramc/Orion"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OpenNMT/CTranslate2
Fast inference engine for Transformer models
Pomilon/LEMA
LEMA (Layer-wise Efficient Memory Abstraction): A hardware-aware framework for fine-tuning LLMs...
dilbersha/llm-inference-benchmarking-3080
A production-grade telemetry-aware suite for benchmarking LLM inference performance on NVIDIA RTX 3080.
Yuan-ManX/infera
Infera — A High-Performance Inference Engine for Large Language Models.
gxcsoccer/alloy
Hybrid SSM-Attention language model on Apple Silicon with MLX — interleaving Mamba-2 and...