Joinn99/RocketEval-ICLR
🚀 [ICLR '25] RocketEval: Efficient Automated LLM Evaluation via Grading Checklist
Quickly and automatically assess how well different large language models (LLMs) respond to your specific prompts. You provide a list of questions or prompts and the responses from various LLMs, and this tool generates a detailed grading checklist and scores each response, providing you with a ranking of the models. This is ideal for AI researchers or developers who need to systematically compare and select the best-performing LLMs for their applications.
No commits in the last 6 months.
Use this if you need an efficient, automated way to evaluate the quality of multiple LLM responses against a set of criteria without extensive manual review.
Not ideal if you only need to evaluate a single LLM or if your evaluation criteria are too nuanced for checklist-based grading.
Stars
15
Forks
8
Language
Python
License
MIT
Category
Last pushed
Aug 21, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Joinn99/RocketEval-ICLR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
google/langfun
OO for LLMs
tanaos/artifex
Small Language Model Inference, Fine-Tuning and Observability. No GPU, no labeled data needed.
preligens-lab/textnoisr
Adding random noise to a text dataset, and controlling very accurately the quality of the result
vulnerability-lookup/VulnTrain
A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.
masakhane-io/masakhane-mt
Machine Translation for Africa