daskol/llama.py

Python bindings to llama.cpp

/ 100

Emerging

This project helps developers integrate large language models (LLMs) like LLaMA directly into Python applications. It takes pre-trained LLaMA model weights and outputs a highly optimized, quantized version that can run on standard CPUs, including Apple Silicon. This is intended for Python developers who want to run powerful LLMs locally without relying on cloud services or high-end GPUs.

No commits in the last 6 months.

Use this if you are a Python developer looking to run LLaMA models efficiently on CPU hardware, even on laptops, with optimized performance and reduced memory footprint.

Not ideal if you need a plug-and-play LLM solution without any coding or if your primary goal is to train or fine-tune LLMs, as this focuses on inference.

Python development local AI inference large language models CPU optimization edge AI

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

License

MIT

Higher-rated alternatives

ludwig-ai/ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

withcatai/node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...

mudler/LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...

zhudotexe/kani

kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)

SciSharp/LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

Explore Transformer Models

All categories Trending Transformer directory Insights