mangopy/tool-retrieval-benchmark
Official code for ACL2025 "🔍 Retrieval Models Aren’t Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models"
This project provides the first comprehensive benchmark to evaluate how well existing information retrieval models can find the right digital tools for Large Language Models (LLMs) to use. It takes a description of a task an LLM needs to perform and a collection of available tools, then assesses how accurately retrieval models recommend the most suitable tool. This is for researchers and developers working on building and improving LLMs that can autonomously select and utilize various digital tools.
211 stars.
Use this if you are developing or evaluating retrieval models that help LLMs select appropriate tools from a large set to accomplish specific tasks.
Not ideal if you are looking for an off-the-shelf LLM that already excels at tool-use without needing to evaluate underlying retrieval mechanisms.
Stars
211
Forks
7
Language
JavaScript
License
Apache-2.0
Category
Last pushed
Dec 22, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/mangopy/tool-retrieval-benchmark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
SemBench/SemBench
Benchmarking Semantic Query Processing Engines
zjukg/SKA-Bench
[Paper][EMNLP 2025] SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge...
DIA-Bench/DIA-Bench
The DIA Benchmark Dataset is a benchmarking tool consisting of 150 dynamic question generators...