mims-harvard/Qworld
Qworld: Question-Specific Evaluation Criteria for LLMs
When evaluating large language models (LLMs) on complex, open-ended questions, Qworld helps you create detailed, context-specific evaluation criteria. Instead of using generic rubrics, it takes a question and generates a comprehensive set of binary criteria, scenarios, and perspectives to judge the quality of an LLM's response. This tool is for anyone who needs to rigorously assess LLMs, such as AI researchers, product managers developing LLM applications, or educators creating LLM-based learning tools.
Use this if you need highly detailed, question-specific criteria to evaluate how well a large language model answers complex or open-ended questions, moving beyond simple binary scores.
Not ideal if you only need a quick, high-level assessment of LLM performance or if your questions have straightforward, single-correct-answer responses.
Stars
20
Forks
1
Language
Python
License
MIT
Category
Last pushed
Mar 26, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mims-harvard/Qworld"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents