Agent Reliability Engineering AI Agents

Standards, frameworks, and operational tooling for measuring, testing, and improving the reliability of AI agents before and after production deployment. Includes failure-mode evaluation, SRE principles applied to agents, quality metrics, and deterministic safety guarantees. Does NOT include general agent monitoring dashboards, agent security hardening, or agent infrastructure resilience (those focus on different aspects of operations).

There are 33 agent reliability engineering agents tracked. 2 score above 50 (established tier). The highest-rated is petterjuan/agentic-reliability-framework at 53/100 with 19 stars.

Get all 33 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=agents&subcategory=agent-reliability-engineering&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Agent	Score	Tier	Stars	Language
1	petterjuan/agentic-reliability-framework ARF is an agentic reliability intelligence platform that separates decision...	53	Established	19	Python
2	sarkar-ai-taken/riva Local-first observability and control plane for AI agents.	52	Established	3	Python
3	Nubaeon/empirica Make AI agents and AI workflows measurably reliable. Epistemic...	48	Emerging	187	Python
4	relai-ai/relai-sdk A platform for building reliable AI agents	41	Emerging	93	Python
5	itbench-hub/ITBench-CISO-CAA-Agent Code repository for CISO agent as part of ITBench	40	Emerging	21	Python
6	exospherehost/ai-reliability-standards Architectural standards and best practices for building reliable AI Agents...	38	Emerging	4	Dockerfile
7	imtt-dev/steer The Active Reliability Layer for AI Agents. Catch failures, teach fixes, and...	38	Emerging	130	Python
8	soumendrak/ragwatch An SDK for Python AI Agents. Under heavy development.	37	Emerging	5	Python
9	kalibr-ai/kalibr-sdk-python Your agents silently degrade in production. Kalibr keeps them on the optimal...	36	Emerging	24	Python
10	eth-sri/ToolFuzz ToolFuzz is a fuzzing framework designed to test your LLM Agent tools.	33	Emerging	37	Python
11	khan5v/kalibra Statistical regression detection and CI quality gates for AI agents	32	Emerging	1	Python
12	ai-2070/l0-python L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable....	26	Experimental	3	Python
13	ai-2070/l0 L0: The Missing Reliability Substrate for AI. Streaming-first. Reliable....	25	Experimental	2	TypeScript
14	choutos/agent-reliability-engineering Agent Reliability Engineering: applying SRE principles to AI agent systems....	25	Experimental	3	Shell
15	alyssadata/Driftmap-Public-Harness_llm-eval-harness-lite Public Driftmap harness: public-safe CSV suites + rubrics + run logs for...	24	Experimental	1	Python
16	thinkbigcd/agent-monitor monitoring dashboard and observability tools for ai agents	24	Experimental	4	Python
17	enkronos/agentevalops Failure-mode evaluation harness for agent systems.	23	Experimental	1	TypeScript
18	johnnylugm-tech/agent-dashboard A light-weight skill with quick start to monitor the latest status of each...	22	Experimental	—	Python
19	StanislavBG/stepproof Regression testing CLI for AI agents — define expected behaviors in YAML,...	22	Experimental	—	TypeScript
20	zahere/reliability-polynomials Generalized reliability polynomials for quality-weighted network analysis....	21	Experimental	—	Python
21	kadubon/Oversight-Centered-Metrology-PoC Lightweight proof-of-concept for oversight-centered metrology in coding...	21	Experimental	—	Python
22	arabindanarayandas/invari The repair layer for AI agents. Validates and fixes malformed API calls in...	21	Experimental	—	TypeScript
23	SyntheticSynaptic/agentura CI for AI agents, no SDK. Define eval suites in agentura.yaml, run them on...	21	Experimental	—	TypeScript
24	LuisGG72/reliability-pack-api Operational reliability API for AI agents: normalize inputs, contract-test...	21	Experimental	—	—
25	MyK-Exee/ai-assert Verify AI-generated outputs against constraints with retries to ensure...	21	Experimental	—	Python
26	nobutakayamauchi/RTS ai-agents llm-ai gpt-workflows ai-audit execution-logging ai-research...	18	Experimental	2	Python
27	Sutr-dev999/agent-monitoring-system Agent Monitoring System	15	Experimental	1	—
28	NithiN-1808/agentchaos Chaos testing for agentic AI — fault injection hooks for openai-agents-python	14	Experimental	—	Python
29	feralghost/model-watchdog Auto-rollback for AI agent config changes. Zero dependencies, single Python file.	14	Experimental	—	Python
30	conde-fc/agentic-ai-accountability Post-deployment behavioral measurement framework for AI agents — traces...	14	Experimental	—	Python
31	mohamedchouat/ai-verifier AI Verifier is an Android app that lets you ask questions to multiple AI...	13	Experimental	—	Kotlin
32	tylerdh12/agent-reliability-toolkit Open-source testing framework for AI agents. Test for the 7 failure modes...	13	Experimental	—	Python
33	AlphaV2/p2p-agent-system-monitor A real-time, decentralized system monitoring tool built with Python,...	11	Experimental	—	Python

Comparisons in this category

l0-python and l0 (26 vs 25)