awinml/llama-cpp-python-bindings

Run fast LLM Inference using Llama.cpp in Python

37
/ 100
Emerging

This project helps Python developers run powerful large language models (LLMs) directly on their computer's central processing unit (CPU). You provide a GGUF format LLM model file and a text prompt, and it quickly generates a text response. This is for Python developers who need to integrate efficient, local LLM inference into their applications without relying on specialized GPU hardware or cloud services.

No commits in the last 6 months.

Use this if you are a Python developer and need to run language models efficiently on standard CPUs for local text generation or analysis tasks.

Not ideal if you are not a Python developer or require the absolute fastest inference speeds only achievable with high-end GPUs.

Python development local AI text generation natural language processing CPU inference
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

19

Forks

4

Language

Jupyter Notebook

License

MIT

Last pushed

Jan 03, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/awinml/llama-cpp-python-bindings"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.