Agent Reliability Engineering AI Agents

Standards, frameworks, and operational tooling for measuring, testing, and improving the reliability of AI agents before and after production deployment. Includes failure-mode evaluation, SRE principles applied to agents, quality metrics, and deterministic safety guarantees. Does NOT include general agent monitoring dashboards, agent security hardening, or agent infrastructure resilience (those focus on different aspects of operations).

There are 33 agent reliability engineering agents tracked. 2 score above 50 (established tier). The highest-rated is petterjuan/agentic-reliability-framework at 53/100 with 19 stars.

Get all 33 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=agents&subcategory=agent-reliability-engineering&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Agent Score Tier
1 petterjuan/agentic-reliability-framework

ARF is an agentic reliability intelligence platform that separates decision...

53
Established
2 sarkar-ai-taken/riva

Local-first observability and control plane for AI agents.

52
Established
3 Nubaeon/empirica

Make AI agents and AI workflows measurably reliable. Epistemic...

48
Emerging
4 relai-ai/relai-sdk

A platform for building reliable AI agents

41
Emerging
5 itbench-hub/ITBench-CISO-CAA-Agent

Code repository for CISO agent as part of ITBench

40
Emerging
6 exospherehost/ai-reliability-standards

Architectural standards and best practices for building reliable AI Agents...

38
Emerging
7 imtt-dev/steer

The Active Reliability Layer for AI Agents. Catch failures, teach fixes, and...

38
Emerging
8 soumendrak/ragwatch

An SDK for Python AI Agents. Under heavy development.

37
Emerging
9 kalibr-ai/kalibr-sdk-python

Your agents silently degrade in production. Kalibr keeps them on the optimal...

36
Emerging
10 eth-sri/ToolFuzz

ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.

33
Emerging
11 khan5v/kalibra

Statistical regression detection and CI quality gates for AI agents

32
Emerging
12 ai-2070/l0-python

L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable....

26
Experimental
13 ai-2070/l0

L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable....

25
Experimental
14 choutos/agent-reliability-engineering

Agent Reliability Engineering: applying SRE principles to AI agent systems....

25
Experimental
15 alyssadata/Driftmap-Public-Harness_llm-eval-harness-lite

Public Driftmap harness: public-safe CSV suites + rubrics + run logs for...

24
Experimental
16 thinkbigcd/agent-monitor

monitoring dashboard and observability tools for ai agents

24
Experimental
17 enkronos/agentevalops

Failure-mode evaluation harness for agent systems.

23
Experimental
18 johnnylugm-tech/agent-dashboard

A light-weight skill with quick start to monitor the latest status of each...

22
Experimental
19 StanislavBG/stepproof

Regression testing CLI for AI agents — define expected behaviors in YAML,...

22
Experimental
20 zahere/reliability-polynomials

Generalized reliability polynomials for quality-weighted network analysis....

21
Experimental
21 kadubon/Oversight-Centered-Metrology-PoC

Lightweight proof-of-concept for oversight-centered metrology in coding...

21
Experimental
22 arabindanarayandas/invari

The repair layer for AI agents. Validates and fixes malformed API calls in...

21
Experimental
23 SyntheticSynaptic/agentura

CI for AI agents, no SDK. Define eval suites in agentura.yaml, run them on...

21
Experimental
24 LuisGG72/reliability-pack-api

Operational reliability API for AI agents: normalize inputs, contract-test...

21
Experimental
25 MyK-Exee/ai-assert

Verify AI-generated outputs against constraints with retries to ensure...

21
Experimental
26 nobutakayamauchi/RTS

ai-agents llm-ai gpt-workflows ai-audit execution-logging ai-research...

18
Experimental
27 Sutr-dev999/agent-monitoring-system

Agent Monitoring System

15
Experimental
28 NithiN-1808/agentchaos

Chaos testing for agentic AI — fault injection hooks for openai-agents-python

14
Experimental
29 feralghost/model-watchdog

Auto-rollback for AI agent config changes. Zero dependencies, single Python file.

14
Experimental
30 conde-fc/agentic-ai-accountability

Post-deployment behavioral measurement framework for AI agents — traces...

14
Experimental
31 mohamedchouat/ai-verifier

AI Verifier is an Android app that lets you ask questions to multiple AI...

13
Experimental
32 tylerdh12/agent-reliability-toolkit

Open-source testing framework for AI agents. Test for the 7 failure modes...

13
Experimental
33 AlphaV2/p2p-agent-system-monitor

A real-time, decentralized system monitoring tool built with Python,...

11
Experimental

Comparisons in this category