pillowsofwind/DebateQA
[EACL 2026] The official GitHub repo for the paper "DebateQA: Evaluating Question Answering on Debatable Knowledge"
When evaluating how well a Large Language Model (LLM) answers complex, debatable questions, this tool helps you measure the quality of its responses. You input an LLM's generated answers to a set of debatable questions, and it outputs scores for how comprehensive and balanced those answers are. This is for researchers and developers working on LLM evaluation and responsible AI.
Use this if you need to objectively quantify how well an LLM handles questions with multiple valid perspectives or acknowledges the contentious nature of a topic.
Not ideal if you are looking for a tool to generate debates or synthesize different viewpoints, as this is purely for evaluation.
Stars
11
Forks
—
Language
Python
License
—
Category
Last pushed
Jan 16, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/pillowsofwind/DebateQA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
asahi417/lm-question-generation
Multilingual/multidomain question generation datasets, models, and python library for question...
SparkJiao/SLQA
An Unofficial Pytorch Implementation of Multi-Granularity Hierarchical Attention Fusion Networks...
MurtyShikhar/Question-Answering
TensorFlow implementation of Match-LSTM and Answer pointer for the popular SQuAD dataset.
hsinyuan-huang/FlowQA
Implementation of conversational QA model: FlowQA (with slight improvement)
allenai/aokvqa
Official repository for the A-OKVQA dataset