Picovoice/picollm

On-device LLM Inference Powered by X-Bit Quantization

/ 100

Established

This tool helps developers integrate highly accurate, compressed large language models (LLMs) directly into their applications, allowing them to run AI-powered features on user devices or local servers. It takes open-weight LLMs and delivers efficient, private AI inference, enabling features like local voice assistants or smart text generation. This is ideal for software engineers building applications that require offline AI capabilities.

305 stars. Available on PyPI.

Use this if you are a software developer creating applications that need to run large language models directly on user devices (like phones or embedded systems) or local machines, prioritizing privacy and offline functionality.

Not ideal if you need to run proprietary LLMs or prefer relying solely on cloud-based AI services without local execution.

mobile-app-development edge-ai offline-ai embedded-systems privacy-focused-applications

No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 12 / 25

How are scores calculated?

Stars

305

Forks

Language

Python

License

Apache-2.0

Compare

picollm and SqueezeLLM

Related models

ModelCloud/GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...

intel/auto-round

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...

pytorch/ao

PyTorch native quantization and sparsity for training and inference

bodaay/HuggingFaceModelDownloader

Simple go utility to download HuggingFace Models and Datasets

NVIDIA/kvpress

LLM KV cache compression made easy

Explore Transformer Models

All categories Trending Transformer directory Insights