mangopy/tool-retrieval-benchmark

Official code for ACL2025 "🔍 Retrieval Models Aren’t Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models"

40
/ 100
Emerging

This project provides the first comprehensive benchmark to evaluate how well existing information retrieval models can find the right digital tools for Large Language Models (LLMs) to use. It takes a description of a task an LLM needs to perform and a collection of available tools, then assesses how accurately retrieval models recommend the most suitable tool. This is for researchers and developers working on building and improving LLMs that can autonomously select and utilize various digital tools.

211 stars.

Use this if you are developing or evaluating retrieval models that help LLMs select appropriate tools from a large set to accomplish specific tasks.

Not ideal if you are looking for an off-the-shelf LLM that already excels at tool-use without needing to evaluate underlying retrieval mechanisms.

LLM development tool learning information retrieval AI model evaluation natural language processing
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

211

Forks

7

Language

JavaScript

License

Apache-2.0

Last pushed

Dec 22, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/mangopy/tool-retrieval-benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.