HolmesGPT/holmesgpt
SRE Agent - CNCF Sandbox Project
This tool helps Site Reliability Engineers (SREs) and DevOps professionals automatically detect and diagnose issues in complex production systems. It takes in live observability data from various platforms like Kubernetes, AWS, Datadog, and Prometheus. It then outputs actionable insights, root cause analyses, and even automated fixes directly to communication channels like Slack or by opening pull requests. It operates continuously in the background, spotting problems before they impact customers.
1,967 stars. Actively maintained with 77 commits in the last 30 days.
Use this if you need an AI agent to proactively monitor your production environment, identify incident root causes across a diverse tech stack, and even propose or apply automated remediations.
Not ideal if you prefer manual investigations or if your observability data is not integrated with standard platforms this agent can connect to.
Stars
1,967
Forks
258
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
77
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/HolmesGPT/holmesgpt"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
kubewall/kubewall
kubewall - Single-Binary Kubernetes Dashboard with Multi-Cluster Management & AI Integration....
gofireflyio/aiac
Artificial Intelligence Infrastructure-as-Code Generator.
radareorg/r2ai
LLM-based reversing for radare2
mr-tbot/mesh-api
MESH-API (previously MESH-AI) — Off-Grid AI & API Router with over 30 API extensions for...
WeOps-Lab/OpsPilot
OpsPilot is an open source intelligent operation and maintenance assistant based on deep...