mangopy/tool-retrieval-benchmark

Official code for ACL2025 "🔍 Retrieval Models Aren’t Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models"

/ 100

Emerging

This project provides the first comprehensive benchmark to evaluate how well existing information retrieval models can find the right digital tools for Large Language Models (LLMs) to use. It takes a description of a task an LLM needs to perform and a collection of available tools, then assesses how accurately retrieval models recommend the most suitable tool. This is for researchers and developers working on building and improving LLMs that can autonomously select and utilize various digital tools.

211 stars.

Use this if you are developing or evaluating retrieval models that help LLMs select appropriate tools from a large set to accomplish specific tasks.

Not ideal if you are looking for an off-the-shelf LLM that already excels at tool-use without needing to evaluate underlying retrieval mechanisms.

LLM development tool learning information retrieval AI model evaluation natural language processing

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

211

Forks

Language

JavaScript

License

Apache-2.0

Higher-rated alternatives

SemBench/SemBench

Benchmarking Semantic Query Processing Engines

zjukg/SKA-Bench

[Paper][EMNLP 2025] SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge...

DIA-Bench/DIA-Bench

The DIA Benchmark Dataset is a benchmarking tool consisting of 150 dynamic question generators...

Explore Embedding Tools

All categories Trending Embeddings directory Insights