awinml/llama-cpp-python-bindings
Run fast LLM Inference using Llama.cpp in Python
This project helps Python developers run powerful large language models (LLMs) directly on their computer's central processing unit (CPU). You provide a GGUF format LLM model file and a text prompt, and it quickly generates a text response. This is for Python developers who need to integrate efficient, local LLM inference into their applications without relying on specialized GPU hardware or cloud services.
No commits in the last 6 months.
Use this if you are a Python developer and need to run language models efficiently on standard CPUs for local text generation or analysis tasks.
Not ideal if you are not a Python developer or require the absolute fastest inference speeds only achievable with high-end GPUs.
Stars
19
Forks
4
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jan 03, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/awinml/llama-cpp-python-bindings"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
beehive-lab/GPULlama3.java
GPU-accelerated Llama3.java inference in pure Java using TornadoVM.
gitkaz/mlx_gguf_server
This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...
srgtuszy/llama-cpp-swift
Swift bindings for llama-cpp library
JackZeng0208/llama.cpp-android-tutorial
llama.cpp tutorial on Android phone
RhinoDevel/mt_llm
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.