LarHope/ollama-benchmark

Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.

53
/ 100
Established

This tool helps you understand how quickly different large language models (LLMs) perform on your computer. It takes a list of models and prompts as input, then measures how fast they process your input and generate responses, as well as how long they take to load. You'll get clear metrics like tokens per second, which is useful for developers building applications powered by local LLMs.

Use this if you are a developer deploying local LLMs and need to compare their performance to choose the most efficient model for your application.

Not ideal if you are a non-technical user just looking to chat with an LLM without needing to understand its underlying speed metrics.

LLM deployment model evaluation performance testing local AI development application optimization
No Package No Dependents
Maintenance 13 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

45

Forks

8

Language

Python

License

MIT

Last pushed

Mar 20, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/LarHope/ollama-benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.