All AI Evaluation Tools
216 tools ranked by quality score · Page 2 of 3
| # | Tool | Score | Tier |
|---|---|---|---|
| 101 |
beling/bsuccinct-rs
Rust libraries and programs focused on succinct data structures |
|
Established |
| 102 |
DataDog/orchestrion
Automatic compile-time instrumentation of Go code |
|
Established |
| 103 |
FriendsOfOpenTelemetry/opentelemetry-bundle
Traces, metrics, and logs instrumentation within your Symfony application |
|
Established |
| 104 |
qwerty541/dns-bench
Find the fastest DNS in your location to improve internet browsing experience. |
|
Established |
| 105 |
ldcsaa/hp-soa
A fully functional, easy-to-use, and highly scalable microservice framework |
|
Established |
| 106 |
tlog-dev/tlog
Observability events system. |
|
Established |
| 107 |
ecoAPM/BenchmarkMockNet
Using BenchmarkDotNet to compare .NET mocking library performance |
|
Established |
| 108 |
smarr/ReBenchDB
ReBenchDB records benchmark results and provides customizable reporting to... |
|
Established |
| 109 |
vincentfree/opentelemetry
Open Telemetry extensions |
|
Established |
| 110 |
Point72/raydar
A perspective powered, user editable ray dashboard via ray serve |
|
Established |
| 111 |
quochuydev/dokploy-grafana-compose
Docker Compose stack for Grafana observability: Tempo traces, Loki logs,... |
|
Established |
| 112 |
ROCm/madengine
madengine is a streamlined CLI tool for running and benchmarking AI models... |
|
Established |
| 113 |
nfrankel/opentelemetry-tracing
Demo for end-to-end tracing via OpenTelemetry |
|
Established |
| 114 |
CodSpeedHQ/action
Github Actions for running CodSpeed in your CI |
|
Established |
| 115 |
kieker-monitoring/moobench
Micro-benchmarks for quantification of the performance overhead caused by... |
|
Established |
| 116 |
ipyflow/ipyflow
A reactive Python kernel for Jupyter notebooks. |
|
Established |
| 117 |
KaykCaputo/oracletrace
Lightweight Python tool to detect performance regressions and compare... |
|
Emerging |
| 118 |
RRZE-HPC/MachineState
This CLI tool and Python3 module collects the current system state for documentation |
|
Emerging |
| 119 |
dinesh-git17/claudehome
An architectural persistence experiment for large language models. Claude’s... |
|
Emerging |
| 120 |
ivanfioravanti/llm_context_benchmarks
📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing... |
|
Emerging |
| 121 |
facebookresearch/CUTracer
A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel... |
|
Emerging |
| 122 |
nyrkio/nyrkio
Nyrkiö is an open source platform for detecting performance changes in a... |
|
Emerging |
| 123 |
oteldb/oteldb
OpenTelemetry signal storage |
|
Emerging |
| 124 |
tw4452852/zbpf
Writing eBPF in Zig |
|
Emerging |
| 125 |
JDiskMark/jdm-java
Cross-platform Java Disk Benchmark Utility for measuring drive IO performance. |
|
Emerging |
| 126 |
lucsorel/pydoctrace
Generate architecture diagrams by tracing Python code execution |
|
Emerging |
| 127 |
komoju/komoju-datadog
Rust Datadog instrumentation |
|
Emerging |
| 128 |
mesaglio/otel-front
Lightweight OpenTelemetry viewer for local development. Single binary, no... |
|
Emerging |
| 129 |
Helmholtz-AI-Energy/perun
Perun is a Python package that measures the energy consumption of your applications. |
|
Emerging |
| 130 |
containerscrew/nflux
Simple network monitoring agent tool. Powered by eBPF & Rust 🐝 |
|
Emerging |
| 131 |
blooop/bencher
A package for benchmarking the characteristics of arbitrary functions |
|
Emerging |
| 132 |
GabrielTecuceanu/httpress
a fast HTTP benchmarking tool built in Rust |
|
Emerging |
| 133 |
DataDog/httpd-datadog
Enhance Apache HTTPD Observability with Datadog's Module |
|
Emerging |
| 134 |
proactive-agent/langgraphics
Visualize live LangGraph execution and see how your agent thinks as it runs. |
|
Emerging |
| 135 |
CodSpeedHQ/instrument-hooks
Internal core for the codspeed instruments |
|
Emerging |
| 136 |
kjldev/purview-telemetry-sourcegenerator
.NET Source Generator for interface-based telemetry. Supporting activities,... |
|
Emerging |
| 137 |
agurinov/gopl
Golang platform library |
|
Emerging |
| 138 |
grafana/otel-profiling-go
Open Telemetry integration for Grafana Pyroscope and tracing solutions such... |
|
Emerging |
| 139 |
Spectral-Knight-Ops/local-llm-evaluator
Quickly test local LLMs with custom prompts to determine which model is best for you. |
|
Emerging |
| 140 |
feelpp/benchmarking
Feel++ Benchmarking |
|
Emerging |
| 141 |
gstinoco/mGFD
Meshless Generalized Finite Differences (mGFD) solver and reference... |
|
Emerging |
| 142 |
shnarazk/SAT-bench
A benchmark suit for SAT solvers |
|
Emerging |
| 143 |
uptrace/uptrace-ruby
OpenTelemetry Ruby distribution for Uptrace |
|
Emerging |
| 144 |
coralogix/coralogix-management-sdk
API clients for configuring the Coralogix platform. |
|
Emerging |
| 145 |
omniviser/omniray
Stop guessing! You and your AI can now see live what's happening inside your... |
|
Emerging |
| 146 |
HPE/torch-hammer
Torch Hammer: Strike while the GPU is hot |
|
Emerging |
| 147 |
typelevel/otel4s-sdk
Implementation of the otel4s SDK modules in Scala from scratch |
|
Emerging |
| 148 |
falcondev-oss/workflow
Simple type-safe queue worker with durable execution based on BullMQ. |
|
Emerging |
| 149 |
beorn/loggily
TypeScript logger with debug-style namespaces, structured JSON, and... |
|
Emerging |
| 150 |
givecareapp/givecare-bench
AI safety benchmark for long-term caregiving relationships. Tests crisis... |
|
Emerging |
| 151 |
NyanKiyoshi/pytest-django-queries
Generate performance reports from your django database performance tests. |
|
Emerging |
| 152 |
pgx-contrib/pgxotel
OpenTelemetry tracing instrumentation for pgx v5 — spans for queries,... |
|
Emerging |
| 153 |
skerkour/go-benchmarks
Comprehensive and reproducible benchmarks for Go developers and architects. |
|
Emerging |
| 154 |
rsasaki0109/CloudAnalyzer
CLI-first QA toolkit for point clouds, trajectories, and 3D perception... |
|
Emerging |
| 155 |
MrAlias/flow
An OpenTelemetry SpanProcessor reporting tracing flow metrics |
|
Emerging |
| 156 |
udhos/opentelemetry-trace-sqs
opentelemetry-trace-sqs propagates Open Telemetry tracing with SQS messages... |
|
Emerging |
| 157 |
jamesgober/metrics-lib
The fastest metrics library for Rust. Lock-free 0.6ns gauges, 18ns counters,... |
|
Emerging |
| 158 |
smyrgeorge/log4k
A Comprehensive Logging and Tracing Solution for Kotlin Multiplatform. |
|
Emerging |
| 159 |
KempnerInstitute/nvidia-hpc-benchmarks
NVIDIA HPC Benchmarks |
|
Emerging |
| 160 |
meshkovQA/Eval-ai-library
Comprehensive AI Model Evaluation Framework with advanced techniques... |
|
Emerging |
| 161 |
getaxonflow/axonflow
AxonFlow: Runtime control layer for production AI |
|
Emerging |
| 162 |
IBM/OpenDsStar
OpenDsStar is an open-source implementation of the DS-Star agent that... |
|
Emerging |
| 163 |
kobsio/kobs
Kubernetes Observability Platform |
|
Emerging |
| 164 |
hdmsantander/microservices-ops-demo
Spring Boot demo for observability, traceability and error analysis in a... |
|
Emerging |
| 165 |
mbzuai-oryx/Agent-X
ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric... |
|
Emerging |
| 166 |
evaluation-context-protocol/ecp
ECP is a standardized interface for orchestrating, auditing, and enforcing... |
|
Emerging |
| 167 |
verifywise-ai/plugin-marketplace
VerifyWise AI Governance Plugin Marketplace |
|
Emerging |
| 168 |
braintrustdata/braintrust-pi-extension
Braintrust tracing plugin for pi |
|
Emerging |
| 169 |
nixel2007/opentelemetry
OpenTelemetry SDK для OneScript |
|
Emerging |
| 170 |
everythings-gonna-be-alright/phpScope
PHP profiler that sends CPU sampling data to Pyroscope server. |
|
Emerging |
| 171 |
opsrobot-ai/opsrobot
Observability platform for OpenClaw agents, providing real-time tracing,... |
|
Emerging |
| 172 |
kolloch/reqray
Log call tree summaries after each request for rust programs instrumented... |
|
Emerging |
| 173 |
tracewayapp/opentelemetry-symfony-bundle
Pure-PHP OpenTelemetry instrumentation for Symfony - automatic HTTP,... |
|
Emerging |
| 174 |
PacificBiosciences/aardvark
A tool for sniffing out the differences in vari-Ants |
|
Emerging |
| 175 |
yonatan-h/express-k6-profiler
Finds bottlenecks in an Express app during load testing |
|
Emerging |
| 176 |
cuihairu/croupier
Croupier is a universal GM (Game Master) backend system designed for game... |
|
Emerging |
| 177 |
aykhans/sarin
A high-performance HTTP load testing tool. Features dynamic request... |
|
Emerging |
| 178 |
dolmen-go/flagx
Extensions for the Go 'flag' package: flagx, flagfile, flagnet, flagtrace |
|
Emerging |
| 179 |
MrAlias/collex
Use OpenTelemetry Collector Factories to Export with OpenTelemetry Go |
|
Emerging |
| 180 |
rodneylab/axum-graphql
Rust GraphQL demo/test API written in Rust, using Axum for routing,... |
|
Emerging |
| 181 |
AmalChandru/termtrace
A terminal workflow recorder that turns debugging sessions into replayable,... |
|
Emerging |
| 182 |
last9/opentelemetry-examples
Production-ready OpenTelemetry instrumentation examples for Go, Python,... |
|
Emerging |
| 183 |
PAIR-Systems-Inc/little-dorrit-editor
Multimodal benchmark for evaluating handwritten editorial correction in printed text. |
|
Emerging |
| 184 |
filipsPL/optuml
Optuna-optimized ML methods, with scikit-learn like API |
|
Emerging |
| 185 |
BudEcosystem/bud-runtime
Bud AI Foundry - A comprehensive inference stack for compound AI deployment,... |
|
Emerging |
| 186 |
russfellows/sai3-bench
A multi-protocol storage performance testing tool, inspired by vdbench, fio... |
|
Emerging |
| 187 |
hboublal/dopGuard
Modular observability platform for .NET applications, integrating with tools... |
|
Emerging |
| 188 |
imadAttar/spring-boot-unified-observability-starter
All-in-one Spring Boot Starter for Observability: Metrics, Traces, Logs, and... |
|
Emerging |
| 189 |
nshkrdotcom/AITrace
The unified observability layer for the AI Control Plane |
|
Emerging |
| 190 |
qcmet/qcmet
Quantum Computing Metrics and Benchmarks |
|
Emerging |
| 191 |
tolitius/cupel
discover LLMs punching above their weight |
|
Experimental |
| 192 |
wangyz1999/sync-video-label
A web-based annotation tool for synchronized multi-video timeline labeling... |
|
Experimental |
| 193 |
iRevive/fs2-grpc-otel4s
otel4s instrumentation for fs2-grpc |
|
Experimental |
| 194 |
mnemom/mnemom-platform
Safe House for AI agents — transparent gateway with inbound + outbound... |
|
Experimental |
| 195 |
rvnhq/raven
A lightweight, self-hostable cloud infrastructure monitoring and telemetry platform. |
|
Experimental |
| 196 |
DaSH-Lab-CSIS/blossom
BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed... |
|
Experimental |
| 197 |
kyahikaru/llm-guardrail-red-teaming
Protocol constrained red teaming of frontier LLM guardrails in high risk... |
|
Experimental |
| 198 |
last9/rails-otel-context
Tells you which code fired that query. Zero config. |
|
Experimental |
| 199 |
thanhdaon/clean-arch-go
Clean Architecture, DDD, CQRS with testings in Go |
|
Experimental |
| 200 |
LLMSystems/BehaviorRL-Hallucination
Learning When to Answer: Behavior-Oriented Reinforcement Learning for... |
|
Experimental |