AmpereComputingAI/llama.cpp

Ampere optimized llama.cpp

38
/ 100
Emerging

This project helps you run large language models (LLMs) more efficiently on Ampere CPUs. You provide a GGUF format LLM, and it outputs the model ready for faster inference, especially using Ampere's custom quantization. It is designed for developers and AI practitioners who need to deploy and optimize LLMs on Ampere hardware.

Use this if you are a developer or AI practitioner working with Large Language Models and want to run them optimally on Ampere CPUs or Ampere-based cloud VMs.

Not ideal if you are not using Ampere hardware or if you do not have experience with Docker and command-line model conversion/quantization.

AI-inference Large-Language-Models model-optimization edge-AI cloud-deployment
No License No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

33

Forks

5

Language

Python

License

Last pushed

Jan 30, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/AmpereComputingAI/llama.cpp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.