HolmesGPT/holmesgpt

SRE Agent - CNCF Sandbox Project

70
/ 100
Verified

This tool helps Site Reliability Engineers (SREs) and DevOps professionals automatically detect and diagnose issues in complex production systems. It takes in live observability data from various platforms like Kubernetes, AWS, Datadog, and Prometheus. It then outputs actionable insights, root cause analyses, and even automated fixes directly to communication channels like Slack or by opening pull requests. It operates continuously in the background, spotting problems before they impact customers.

1,967 stars. Actively maintained with 77 commits in the last 30 days.

Use this if you need an AI agent to proactively monitor your production environment, identify incident root causes across a diverse tech stack, and even propose or apply automated remediations.

Not ideal if you prefer manual investigations or if your observability data is not integrated with standard platforms this agent can connect to.

Site Reliability Engineering DevOps Incident Management Production Monitoring Root Cause Analysis
No Package No Dependents
Maintenance 22 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

1,967

Forks

258

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

77

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/HolmesGPT/holmesgpt"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.