hhan1018/NesTools

[COLING 2025] NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

34
/ 100
Emerging

This project helps AI researchers and developers evaluate how well large language models (LLMs) can learn and use multiple tools in complex, nested sequences. You input the LLM's responses and the evaluation settings, and it outputs performance metrics on nested tool learning. This is for those working on improving LLM capabilities in advanced reasoning and automation.

No commits in the last 6 months.

Use this if you are developing or benchmarking large language models and need to rigorously test their ability to handle complex, multi-step tasks requiring the sequential application of various tools.

Not ideal if you are an end-user looking to apply an LLM to a specific business problem, rather than evaluating the LLM's core capabilities.

AI research LLM evaluation tool learning model benchmarking natural language processing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

18

Forks

3

Language

Python

License

Apache-2.0

Last pushed

Jan 18, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/hhan1018/NesTools"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.