LarHope/ollama-benchmark
Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.
This tool helps you understand how quickly different large language models (LLMs) perform on your computer. It takes a list of models and prompts as input, then measures how fast they process your input and generate responses, as well as how long they take to load. You'll get clear metrics like tokens per second, which is useful for developers building applications powered by local LLMs.
Use this if you are a developer deploying local LLMs and need to compare their performance to choose the most efficient model for your application.
Not ideal if you are a non-technical user just looking to chat with an LLM without needing to understand its underlying speed metrics.
Stars
45
Forks
8
Language
Python
License
MIT
Category
Last pushed
Mar 20, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/LarHope/ollama-benchmark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
stanfordnlp/axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
aidatatools/ollama-benchmark
LLM Benchmark for Throughput via Ollama (Local LLMs)
qcri/LLMeBench
Benchmarking Large Language Models
THUDM/LongBench
LongBench v2 and LongBench (ACL 25'&24')
microsoft/LLF-Bench
A benchmark for evaluating learning agents based on just language feedback