Picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
This tool helps developers integrate highly accurate, compressed large language models (LLMs) directly into their applications, allowing them to run AI-powered features on user devices or local servers. It takes open-weight LLMs and delivers efficient, private AI inference, enabling features like local voice assistants or smart text generation. This is ideal for software engineers building applications that require offline AI capabilities.
305 stars. Available on PyPI.
Use this if you are a software developer creating applications that need to run large language models directly on user devices (like phones or embedded systems) or local machines, prioritizing privacy and offline functionality.
Not ideal if you need to run proprietary LLMs or prefer relying solely on cloud-based AI services without local execution.
Stars
305
Forks
17
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 02, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Picovoice/picollm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
pytorch/ao
PyTorch native quantization and sparsity for training and inference
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
NVIDIA/kvpress
LLM KV cache compression made easy