zhudotexe/fanoutqa

Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)

/ 100

Emerging

This project provides a comprehensive dataset and evaluation tools for assessing how well large language models (LLMs) answer complex questions that require gathering information from multiple Wikipedia articles. You can input a question and get back an answer, and then evaluate its accuracy against human-written answers. It's designed for researchers or practitioners who are developing and testing advanced question-answering systems.

No commits in the last 6 months. Available on PyPI.

Use this if you are developing or evaluating large language models and need a robust benchmark for multi-hop, multi-document question answering.

Not ideal if you're looking for an off-the-shelf solution to answer general questions using LLMs without needing to develop or evaluate models yourself.

LLM evaluation question answering natural language processing research knowledge retrieval

Stale 6m

Maintenance 2 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

ExtensityAI/symbolicai

A neurosymbolic perspective on LLMs

TIGER-AI-Lab/MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding...

deep-symbolic-mathematics/LLM-SR

[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation...

microsoft/interwhen

A framework for verifiable reasoning with language models.

xlang-ai/Binder

[ICLR 2023] Code for the paper "Binding Language Models in Symbolic Languages"

Explore Transformer Models

All categories Trending Transformer directory Insights