grigio/llm-eval-simple

llm-eval-simple is a simple LLM evaluation framework with intermediate actions and prompt pattern selection

36
/ 100
Emerging

This tool helps AI engineers and machine learning practitioners test different Large Language Models (LLMs) against a set of prompts and their expected answers. You provide text prompts and their correct responses, and the tool evaluates how accurately and quickly each model performs, giving you a detailed report and an interactive dashboard to compare results. It's ideal for anyone looking to benchmark and select the best LLM for specific tasks.

Use this if you need to systematically compare the performance (accuracy and speed) of multiple LLMs on your custom datasets and understand which models are best suited for your applications.

Not ideal if you need to evaluate the qualitative aspects of LLM outputs (like creativity or fluency) that can't be judged by exact matching or a simple AI evaluator model.

LLM-benchmarking AI-model-evaluation prompt-engineering machine-learning-operations performance-testing
No Package No Dependents
Maintenance 10 / 25
Adoption 8 / 25
Maturity 15 / 25
Community 3 / 25

How are scores calculated?

Stars

59

Forks

1

Language

Python

License

MIT

Last pushed

Feb 28, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/grigio/llm-eval-simple"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.