The AI Evals Directory
Quality-scored directory of 216 ai evaluation tools, updated daily. Every tool scored on maintenance, adoption, maturity, and community signals.
Tools for evaluating, benchmarking, and observing AI systems — from LLM eval harnesses to production observability platforms like Langfuse and LangSmith.
27
70–100
89
50–69
74
30–49
26
10–29
Top tools by quality score
| # | Tool | Score |
|---|---|---|
| 1 |
DataDog/dd-trace-js
Datadog APM client for Node.js |
|
| 2 |
lmnr-ai/lmnr
Laminar - open-source observability platform purpose-built for AI agents. YC S24. |
|
| 3 |
mnfst/manifest
Smart LLM Routing for OpenClaw. Cut Costs up to 70% 🦞🦚 |
|
| 4 |
open-telemetry/opentelemetry-rust
The Rust OpenTelemetry implementation |
|
| 5 |
tokio-rs/tracing
Application level tracing for Rust. |
|
| 6 |
DataDog/dd-trace-go
Datadog Go Library including APM tracing, profiling, and security monitoring. |
|
| 7 |
pinpoint-apm/pinpoint
APM, (Application Performance Management) tool for large-scale distributed systems. |
|
| 8 |
DataDog/dd-trace-py
Datadog Python APM Client |
|
| 9 |
open-telemetry/opentelemetry-go
OpenTelemetry Go API and SDK |
|
| 10 |
jaegertracing/jaeger-ui
Web UI for Jaeger |
|
| 11 |
DataDog/datadog-agent
Main repository for Datadog Agent |
|
| 12 |
open-telemetry/opentelemetry-go-instrumentation
OpenTelemetry Auto Instrumentation using eBPF |
|
| 13 |
opentracing-contrib/nginx-opentracing
NGINX plugin for OpenTracing |
|
| 14 |
openzipkin/zipkin
Zipkin is a distributed tracing system |
|
| 15 |
NVIDIA/garak
the LLM vulnerability scanner |
|
| 16 |
winsiderss/systeminformer
A free, powerful, multi-purpose tool that helps you monitor system... |
|
| 17 |
namhyung/uftrace
Function graph tracer for C/C++/Rust/Python |
|
| 18 |
jaegertracing/jaeger
CNCF Jaeger, a Distributed Tracing Platform |
|
| 19 |
autogluon/fev
Forecast evaluation library |
|
| 20 |
confident-ai/deepeval
The LLM Evaluation Framework |
|