ddh0/easy-llama
Python package wrapping llama.cpp for on-device LLM inference
This is a Python toolkit designed for developers who want to integrate large language model (LLM) inference directly into their applications or services. It allows you to load various quantized LLM files (like GGUF) and run them locally on your own hardware, turning text input into text output. It's for developers building applications that need on-device AI capabilities.
101 stars. No commits in the last 6 months. Available on PyPI.
Use this if you are a developer looking to embed local LLM inference capabilities directly into your Python-based software, without relying on external cloud services.
Not ideal if you are an end-user without programming experience, or if you need a high-level API for model management and deployment rather than direct library integration.
Stars
101
Forks
6
Language
Python
License
MIT
Category
Last pushed
Oct 12, 2025
Commits (30d)
0
Dependencies
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ddh0/easy-llama"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
pytorch/ao
PyTorch native quantization and sparsity for training and inference
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
NVIDIA/kvpress
LLM KV cache compression made easy