awinml/llama-cpp-python-bindings

Run fast LLM Inference using Llama.cpp in Python

/ 100

Emerging

This project helps Python developers run powerful large language models (LLMs) directly on their computer's central processing unit (CPU). You provide a GGUF format LLM model file and a text prompt, and it quickly generates a text response. This is for Python developers who need to integrate efficient, local LLM inference into their applications without relying on specialized GPU hardware or cloud services.

No commits in the last 6 months.

Use this if you are a Python developer and need to run language models efficiently on standard CPUs for local text generation or analysis tasks.

Not ideal if you are not a Python developer or require the absolute fastest inference speeds only achievable with high-end GPUs.

Python development local AI text generation natural language processing CPU inference

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

beehive-lab/GPULlama3.java

GPU-accelerated Llama3.java inference in pure Java using TornadoVM.

gitkaz/mlx_gguf_server

This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...

srgtuszy/llama-cpp-swift

Swift bindings for llama-cpp library

JackZeng0208/llama.cpp-android-tutorial

llama.cpp tutorial on Android phone

RhinoDevel/mt_llm

Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.

Explore Transformer Models

All categories Trending Transformer directory Insights