xlang-ai/Spider2-V

[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

/ 100

Emerging

This project offers a testing environment for multimodal AI agents designed to automate complex data science and engineering tasks within a desktop operating system. It takes task instructions, data configurations, and agent baselines as input, then simulates the agent performing actions in a virtual machine, and outputs performance metrics and detailed execution logs. Data scientists, operations engineers, and AI researchers can use this to evaluate how well agents handle real-world desktop workflows.

150 stars. No commits in the last 6 months.

Use this if you are developing or evaluating AI agents that need to interact with a full desktop environment to perform data-related or engineering tasks.

Not ideal if you are looking for a ready-to-use application to automate your own daily desktop tasks without any development.

AI agent evaluation data science automation engineering workflow automation desktop environment simulation multimodal agent testing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

150

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

inclusionAI/AReaL

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

melih-unsal/DemoGPT

🤖 Everything you need to create an LLM Agent—tools, prompts, frameworks, and models—all in one place.

AOSSIE-Org/Perspective

Perspective analyzes your news or social feed and presents credible counter-narratives from...

expectedparrot/edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social...

kaushikb11/awesome-llm-agents

A curated list of awesome LLM agents frameworks.

Explore LLM Tools

All categories Trending LLM Tool directory Insights