dougeeai/llama-cpp-python-wheels

Pre-built wheels for llama-cpp-python across platforms and CUDA versions

/ 100

Emerging

This project provides pre-built software packages for 'llama-cpp-python,' which is essential for running large language models (LLMs) efficiently on your local computer. It saves you the complex steps of compiling software yourself. You get ready-to-use installation files tailored for your specific NVIDIA GPU, CUDA version, and Python version, enabling you to quickly deploy and experiment with LLMs. This is for AI developers, researchers, or data scientists who want to run powerful LLMs on their own hardware.

Use this if you are a developer using Python, want to run Llama-based large language models on your NVIDIA GPU, and prefer a straightforward installation without manual compilation.

Not ideal if you are not a Python developer, don't use NVIDIA GPUs, or prefer to compile software manually for custom configurations.

AI Development Machine Learning Engineering Large Language Models GPU Acceleration Python Development

No Package No Dependents

Maintenance 6 / 25

Adoption 7 / 25

Maturity 13 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

—

License

MIT

Compare

llama-cpp-python-wheels and llama-cpp-python-py314-cuda131-wheel

Higher-rated alternatives

beehive-lab/GPULlama3.java

GPU-accelerated Llama3.java inference in pure Java using TornadoVM.

gitkaz/mlx_gguf_server

This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...

srgtuszy/llama-cpp-swift

Swift bindings for llama-cpp library

JackZeng0208/llama.cpp-android-tutorial

llama.cpp tutorial on Android phone

awinml/llama-cpp-python-bindings

Run fast LLM Inference using Llama.cpp in Python

Explore Transformer Models

All categories Trending Transformer directory Insights