xlang-ai/Spider2-V

[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

39
/ 100
Emerging

This project offers a testing environment for multimodal AI agents designed to automate complex data science and engineering tasks within a desktop operating system. It takes task instructions, data configurations, and agent baselines as input, then simulates the agent performing actions in a virtual machine, and outputs performance metrics and detailed execution logs. Data scientists, operations engineers, and AI researchers can use this to evaluate how well agents handle real-world desktop workflows.

150 stars. No commits in the last 6 months.

Use this if you are developing or evaluating AI agents that need to interact with a full desktop environment to perform data-related or engineering tasks.

Not ideal if you are looking for a ready-to-use application to automate your own daily desktop tasks without any development.

AI agent evaluation data science automation engineering workflow automation desktop environment simulation multimodal agent testing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 13 / 25

How are scores calculated?

Stars

150

Forks

14

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Aug 26, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/xlang-ai/Spider2-V"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.