grigio/llm-eval-simple
llm-eval-simple is a simple LLM evaluation framework with intermediate actions and prompt pattern selection
This tool helps AI engineers and machine learning practitioners test different Large Language Models (LLMs) against a set of prompts and their expected answers. You provide text prompts and their correct responses, and the tool evaluates how accurately and quickly each model performs, giving you a detailed report and an interactive dashboard to compare results. It's ideal for anyone looking to benchmark and select the best LLM for specific tasks.
Use this if you need to systematically compare the performance (accuracy and speed) of multiple LLMs on your custom datasets and understand which models are best suited for your applications.
Not ideal if you need to evaluate the qualitative aspects of LLM outputs (like creativity or fluency) that can't be judged by exact matching or a simple AI evaluator model.
Stars
59
Forks
1
Language
Python
License
MIT
Category
Last pushed
Feb 28, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/grigio/llm-eval-simple"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
eth-sri/matharena
Evaluation of LLMs on latest math competitions
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality,...
HPAI-BSC/TuRTLe
TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)
nlp-uoregon/mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
haesleinhuepf/human-eval-bia
Benchmarking Large Language Models for Bio-Image Analysis Code Generation