The AI Evals Directory

Quality-scored directory of 216 ai evaluation tools, updated daily. Every tool scored on maintenance, adoption, maturity, and community signals.

Tools for evaluating, benchmarking, and observing AI systems — from LLM eval harnesses to production observability platforms like Langfuse and LangSmith.

Verified

27

70–100

Established

89

50–69

Emerging

74

30–49

Experimental

26

10–29

Top tools by quality score

# Tool Score
1 DataDog/dd-trace-js

Datadog APM client for Node.js

95
2 lmnr-ai/lmnr

Laminar - open-source observability platform purpose-built for AI agents. YC S24.

88
3 mnfst/manifest

Smart LLM Routing for OpenClaw. Cut Costs up to 70% 🦞🦚

87
4 open-telemetry/opentelemetry-rust

The Rust OpenTelemetry implementation

81
5 tokio-rs/tracing

Application level tracing for Rust.

78
6 DataDog/dd-trace-go

Datadog Go Library including APM tracing, profiling, and security monitoring.

76
7 pinpoint-apm/pinpoint

APM, (Application Performance Management) tool for large-scale distributed systems.

76
8 DataDog/dd-trace-py

Datadog Python APM Client

76
9 open-telemetry/opentelemetry-go

OpenTelemetry Go API and SDK

76
10 jaegertracing/jaeger-ui

Web UI for Jaeger

76
11 DataDog/datadog-agent

Main repository for Datadog Agent

76
12 open-telemetry/opentelemetry-go-instrumentation

OpenTelemetry Auto Instrumentation using eBPF

73
13 opentracing-contrib/nginx-opentracing

NGINX plugin for OpenTracing

73
14 openzipkin/zipkin

Zipkin is a distributed tracing system

73
15 NVIDIA/garak

the LLM vulnerability scanner

72
16 winsiderss/systeminformer

A free, powerful, multi-purpose tool that helps you monitor system...

72
17 namhyung/uftrace

Function graph tracer for C/C++/Rust/Python

72
18 jaegertracing/jaeger

CNCF Jaeger, a Distributed Tracing Platform

72
19 autogluon/fev

Forecast evaluation library

72
20 confident-ai/deepeval

The LLM Evaluation Framework

71

Browse by category