smpanaro/coreml-llm-cli
CLI to demonstrate running a large language model (LLM) on Apple Neural Engine.
This tool lets developers experiment with running large language models (LLMs) like Llama 2 directly on their Apple computer's Neural Engine. You provide a CoreML-compatible model, and the tool loads it and generates text. It's designed for developers interested in optimizing local LLM performance on macOS.
124 stars. No commits in the last 6 months.
Use this if you are a developer researching or optimizing the performance of large language models on Apple Silicon hardware using CoreML.
Not ideal if you are an end-user looking for a ready-to-use application to interact with an LLM, or if you are not comfortable with command-line tools and developer concepts.
Stars
124
Forks
12
Language
Swift
License
—
Category
Last pushed
Dec 27, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/smpanaro/coreml-llm-cli"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
pytorch/ao
PyTorch native quantization and sparsity for training and inference
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
NVIDIA/kvpress
LLM KV cache compression made easy