xlang-ai/Spider2-V
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
This project offers a testing environment for multimodal AI agents designed to automate complex data science and engineering tasks within a desktop operating system. It takes task instructions, data configurations, and agent baselines as input, then simulates the agent performing actions in a virtual machine, and outputs performance metrics and detailed execution logs. Data scientists, operations engineers, and AI researchers can use this to evaluate how well agents handle real-world desktop workflows.
150 stars. No commits in the last 6 months.
Use this if you are developing or evaluating AI agents that need to interact with a full desktop environment to perform data-related or engineering tasks.
Not ideal if you are looking for a ready-to-use application to automate your own daily desktop tasks without any development.
Stars
150
Forks
14
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Aug 26, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/xlang-ai/Spider2-V"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
inclusionAI/AReaL
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
melih-unsal/DemoGPT
🤖 Everything you need to create an LLM Agent—tools, prompts, frameworks, and models—all in one place.
AOSSIE-Org/Perspective
Perspective analyzes your news or social feed and presents credible counter-narratives from...
expectedparrot/edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social...
kaushikb11/awesome-llm-agents
A curated list of awesome LLM agents frameworks.