mims-harvard/Qworld

Qworld: Question-Specific Evaluation Criteria for LLMs

39
/ 100
Emerging

When evaluating large language models (LLMs) on complex, open-ended questions, Qworld helps you create detailed, context-specific evaluation criteria. Instead of using generic rubrics, it takes a question and generates a comprehensive set of binary criteria, scenarios, and perspectives to judge the quality of an LLM's response. This tool is for anyone who needs to rigorously assess LLMs, such as AI researchers, product managers developing LLM applications, or educators creating LLM-based learning tools.

Use this if you need highly detailed, question-specific criteria to evaluate how well a large language model answers complex or open-ended questions, moving beyond simple binary scores.

Not ideal if you only need a quick, high-level assessment of LLM performance or if your questions have straightforward, single-correct-answer responses.

LLM-evaluation AI-testing model-assessment NLP-benchmarking conversational-AI
No Package No Dependents
Maintenance 13 / 25
Adoption 6 / 25
Maturity 15 / 25
Community 5 / 25

How are scores calculated?

Stars

20

Forks

1

Language

Python

License

MIT

Last pushed

Mar 26, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mims-harvard/Qworld"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.