ddh0/easy-llama

Python package wrapping llama.cpp for on-device LLM inference

/ 100

Emerging

This is a Python toolkit designed for developers who want to integrate large language model (LLM) inference directly into their applications or services. It allows you to load various quantized LLM files (like GGUF) and run them locally on your own hardware, turning text input into text output. It's for developers building applications that need on-device AI capabilities.

101 stars. No commits in the last 6 months. Available on PyPI.

Use this if you are a developer looking to embed local LLM inference capabilities directly into your Python-based software, without relying on external cloud services.

Not ideal if you are an end-user without programming experience, or if you need a high-level API for model management and deployment rather than direct library integration.

on-device-AI LLM-integration local-inference python-development application-building

Stale 6m

Maintenance 2 / 25

Adoption 9 / 25

Maturity 25 / 25

Community 9 / 25

How are scores calculated?

Stars

101

Forks

Language

Python

License

MIT

Higher-rated alternatives

ModelCloud/GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...

intel/auto-round

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...

pytorch/ao

PyTorch native quantization and sparsity for training and inference

bodaay/HuggingFaceModelDownloader

Simple go utility to download HuggingFace Models and Datasets

NVIDIA/kvpress

LLM KV cache compression made easy

Explore Transformer Models

All categories Trending Transformer directory Insights