martinimarcello00/SRE-agent

Autonomous agent for Kubernetes incident detection, diagnosis, and mitigation using LLMs and modular workflows. Integrates LangChain, LangGraph, and MCP servers to enable automated SRE tasks in cloud-native environments.

35
/ 100
Emerging

This system helps Site Reliability Engineers (SREs) or Operations teams automatically detect, diagnose, and resolve issues in Kubernetes environments. It takes in operational data from observability tools like Prometheus and Jaeger, then automatically identifies problems and their root causes within microservice architectures. The output is a detailed diagnosis and proposed mitigation strategy for complex incidents.

Use this if you need to significantly reduce the time it takes to understand and fix complex incidents in your Kubernetes clusters, especially in dynamic microservice environments.

Not ideal if your systems are not deployed on Kubernetes or if you prefer a fully manual, human-driven incident response process.

Site Reliability Engineering Kubernetes Operations Incident Response Microservices Monitoring Cloud-Native Troubleshooting
No License No Package No Dependents
Maintenance 10 / 25
Adoption 5 / 25
Maturity 7 / 25
Community 13 / 25

How are scores calculated?

Stars

9

Forks

2

Language

Jupyter Notebook

License

Last pushed

Jan 30, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/martinimarcello00/SRE-agent"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.