LarHope/ollama-benchmark

Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.

/ 100

Established

This tool helps you understand how quickly different large language models (LLMs) perform on your computer. It takes a list of models and prompts as input, then measures how fast they process your input and generate responses, as well as how long they take to load. You'll get clear metrics like tokens per second, which is useful for developers building applications powered by local LLMs.

Use this if you are a developer deploying local LLMs and need to compare their performance to choose the most efficient model for your application.

Not ideal if you are a non-technical user just looking to chat with an LLM without needing to understand its underlying speed metrics.

LLM deployment model evaluation performance testing local AI development application optimization

No Package No Dependents

Maintenance 13 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Related models

stanfordnlp/axbench

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

aidatatools/ollama-benchmark

LLM Benchmark for Throughput via Ollama (Local LLMs)

qcri/LLMeBench

Benchmarking Large Language Models

THUDM/LongBench

LongBench v2 and LongBench (ACL 25'&24')

microsoft/LLF-Bench

A benchmark for evaluating learning agents based on just language feedback

Explore Transformer Models

All categories Trending Transformer directory Insights