dougeeai/llama-cpp-python-wheels
Pre-built wheels for llama-cpp-python across platforms and CUDA versions
This project provides pre-built software packages for 'llama-cpp-python,' which is essential for running large language models (LLMs) efficiently on your local computer. It saves you the complex steps of compiling software yourself. You get ready-to-use installation files tailored for your specific NVIDIA GPU, CUDA version, and Python version, enabling you to quickly deploy and experiment with LLMs. This is for AI developers, researchers, or data scientists who want to run powerful LLMs on their own hardware.
Use this if you are a developer using Python, want to run Llama-based large language models on your NVIDIA GPU, and prefer a straightforward installation without manual compilation.
Not ideal if you are not a Python developer, don't use NVIDIA GPUs, or prefer to compile software manually for custom configurations.
Stars
40
Forks
3
Language
—
License
MIT
Category
Last pushed
Nov 09, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/dougeeai/llama-cpp-python-wheels"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
beehive-lab/GPULlama3.java
GPU-accelerated Llama3.java inference in pure Java using TornadoVM.
gitkaz/mlx_gguf_server
This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...
srgtuszy/llama-cpp-swift
Swift bindings for llama-cpp library
JackZeng0208/llama.cpp-android-tutorial
llama.cpp tutorial on Android phone
awinml/llama-cpp-python-bindings
Run fast LLM Inference using Llama.cpp in Python