night-chen/ToolQA

ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios.

/ 100

Emerging

This project provides a specialized dataset called ToolQA for evaluating how well large language models (LLMs) can answer complex questions that require using external tools. It includes diverse questions from various domains like flight data, Yelp reviews, and scientific texts, along with the corresponding external knowledge sources and potential tools. AI researchers and developers working on improving LLMs' ability to interact with real-world data and execute multi-step reasoning would use this to benchmark their models.

286 stars. No commits in the last 6 months.

Use this if you are a developer or researcher testing or building large language models (LLMs) and need a robust, diverse dataset to evaluate their ability to answer complex questions by using external data and tools.

Not ideal if you are an end-user looking for a direct application to solve a problem with an LLM, as this is a dataset and toolkit for LLM development and evaluation, not a ready-to-use product.

LLM evaluation tool-augmented AI natural language processing AI research machine learning datasets

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

286

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

monarch-initiative/ontogpt

LLM-based ontological extraction tools, including SPIRES

weAIDB/awesome-data-llm

Official Repository of "LLM × DATA" Survey Paper

AXYZdong/AMchat

AM (Advanced Mathematics) Chat is a large language model that integrates advanced mathematical...

skywalker023/sodaverse

🥤🧑🏻‍🚀Code and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with...

Y-Research-SBU/TimeSeriesScientist

Official Repository for TimeSeriesScientist

Explore LLM Tools

All categories Trending LLM Tool directory Insights