niuzaisheng/ScreenAgent
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
This project offers a way to automate complex computer tasks using an AI agent that 'sees' your screen and 'acts' like a human user. It takes your high-level instructions and translates them into mouse clicks and keyboard inputs, allowing the AI to interact with any desktop application or operating system. It's designed for anyone who needs to automate repetitive, multi-step digital workflows that typically require manual interaction with a graphical user interface.
579 stars. No commits in the last 6 months.
Use this if you need to automate tasks on a computer's graphical interface that current scripting or RPA tools can't handle, such as those requiring visual understanding or adaptive decision-making.
Not ideal if your tasks are primarily text-based, involve simple data entry, or can be accomplished through existing APIs or backend automations.
Stars
579
Forks
62
Language
Python
License
—
Category
Last pushed
Nov 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/niuzaisheng/ScreenAgent"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
openai/openai-agents-python
A lightweight, powerful framework for multi-agent workflows
openagents-org/openagents
OpenAgents - AI Agent Networks for Open Collaboration
vamplabAI/sgr-agent-core
Schema-Guided Reasoning (SGR) has agentic system design created by neuraldeep community
BrainBlend-AI/atomic-agents
Building AI agents, atomically
camel-ai/camel
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents....