SRE Incident Automation AI Agents

AI agents for autonomous incident detection, root cause analysis, and remediation in production environments. Focuses on SRE-specific tools that integrate with observability platforms and cloud infrastructure. Does NOT include general monitoring dashboards, anomaly detection platforms without remediation, or incident classification frameworks.

There are 45 sre incident automation agents tracked. 2 score above 50 (established tier). The highest-rated is Arvo-AI/aurora at 52/100 with 76 stars.

Get all 45 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=agents&subcategory=sre-incident-automation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Agent Score Tier
1 Arvo-AI/aurora

Aurora — Open source AI-powered agentic incident management & root cause...

52
Established
2 a2wio/lucas

A2W's SRE agent for Kubernetes

50
Established
3 datolabs-io/opsy

Opsy - Your AI-Powered SRE Colleague

48
Emerging
4 scitix/siclaw

AI-powered SRE platform — read-only infrastructure diagnostics with deep...

46
Emerging
5 pavangudiwada/awesome-ai-sre

AI SRE tools for RCA, Incident Response, Cost-Saving, Infra management,...

45
Emerging
6 avivl/cloud-sre-agent

An autonomous SRE agent that monitors cloud logs across multiple platforms,...

43
Emerging
7 whitepaper27/Sentri

AI-powered autonomous DBA agent — detects, diagnoses, and fixes Oracle...

42
Emerging
8 chatwoot/faultline

An open-source AI agent for infrastructure debugging.

41
Emerging
9 codeready-toolchain/tarsy

Intelligent Site Reliability Engineering agent for automatic alert processing

36
Emerging
10 ismailperim/oncallmate

🚨 Autonomous AI SRE agent that investigates Docker incidents while you...

36
Emerging
11 qingwave/kubewizard

✨Kubewizard is An AI-Agent for automated Kubernetes troubleshooting, and...

32
Emerging
12 qicesun/SRE-Agent-App

An Autonomous AI SRE Agent for Kubernetes, built with Java Spring Boot &...

31
Emerging
13 hanu-tayal/ai-oncall-agent

AI agents that replace human on-call engineers — automated error analysis,...

29
Experimental
14 codenamev/ruby_llm-ups

ups.dev status page integration for RubyLLM — automatic agent heartbeats,...

25
Experimental
15 Joeen-AI-Labs/Netiarius

CLI agent for Linux server network troubleshooting and repair, with built-in...

25
Experimental
16 vitas/evidra

Flight recorder for Infrastructure Automation. Behavioral Reliability for...

24
Experimental
17 AxonLabsDev/nervmap

Infrastructure cartography CLI — discover services, map dependencies, trace...

23
Experimental
18 kyisaiah47/cloudwatch-genius

AI-powered DevOps agent using Amazon Bedrock & Claude 3 Sonnet for...

22
Experimental
19 bblackheart013/semantic-devops-bot

AI-powered DevOps Assistant that reads error logs, suggests fixes, and...

22
Experimental
20 koustubh-v/AutoDevOps-AI

Autonomous SRE agent that recursively audits, traces, and self-heals...

22
Experimental
21 charles-adedotun/kubepulse

Intelligent Kubernetes health monitoring with AI-powered diagnostics,...

22
Experimental
22 javakishore-veleti/Claims-Processor-With-SRE

A multi-tenant healthcare claims processing platform with AI-powered...

22
Experimental
23 haoranc/agent-estimate

The first open-source effort estimation tool built for AI coding agents....

21
Experimental
24 agentincident/agentincident

The open incident format for autonomous AI agents. Record, classify, and...

21
Experimental
25 jayta1314/awesome-ai-sre

Curate and explore a comprehensive list of AI-driven tools and resources...

21
Experimental
26 csa7mdm/AutoMender

Autonomous AI Agent that detects, analyzes, and self-heals .NET runtime...

21
Experimental
27 dbwls99706/deadends.dev

Structured failure knowledge infrastructure for AI agents — dead ends,...

21
Experimental
28 sydasif/network-automation-agent

Run commands on network device with LLM using netmiko

21
Experimental
29 imIbAd404/sre-agent

🚀 Automate self-healing and root cause analysis for financial services with...

21
Experimental
30 anonymousgirl123/ai-incident-analyzer

Build a production-style AI system that ingests logs and metrics, detects...

21
Experimental
31 obtFusi/network-agent

CLI Agent für Netzwerk-Analyse via natürliche Sprache (Venice.ai)

21
Experimental
32 sinzin91/awesome-sre-skills

A curated list of AI agent skills for Site Reliability Engineering —...

21
Experimental
33 agamm/awesome-ai-sre

A curated list of 100+ AI-powered tools, platforms, and resources for Site...

21
Experimental
34 iemafzalhassan/OutagePilot

OutagePilot uses a multi-agent system to autonomously detect, diagnose, and...

21
Experimental
35 Suraj-kumar00/DataIncidentManager

AI-Powered Autonomous Incident Management for Data Teams

20
Experimental
36 kiloloop/agent-estimate

The first open-source effort estimation tool built for AI coding agents....

19
Experimental
37 ghantakiran/ShieldOps

AI-Powered Autonomous SRE Platform — Autonomous agents for investigation,...

17
Experimental
38 kaiojoceli51/ShieldOps

Automate incident investigation, remediation, and security enforcement...

14
Experimental
39 brngg/herald

AI agent that detects, diagnoses, and remediates Kubernetes incidents with...

14
Experimental
40 DilshanPGN/IncidentIQ

AI-driven observability & incident-analysis agent that plugs into Java...

13
Experimental
41 tareksyria/SREAgents

🤖 Build and manage AI-driven SRE agents to automate operations tasks with...

13
Experimental
42 AdityaIndoori/Sentry

Autonomous AI service monitor multi-agent pipeline (Triage, Detective,...

13
Experimental
43 rubsj/ai-devops-assistant

Multi-agent DevOps AI assistant for pipeline monitoring, log analysis, root...

13
Experimental
44 kamaleshanantha/-metr-time-horizon-feb-2026

Interactive visualization of METR AI agent time horizon benchmark with...

13
Experimental
45 ines312692/DevOps-AI-Agent

This project demonstrates how to build an AI-powered monitoring agent for...

12
Experimental