Uncategorized AI Evaluation Tools

There are 216 uncategorized tools tracked. 27 score above 70 (verified tier). The highest-rated is DataDog/dd-trace-js at 95/100 with 790 stars and 26,477,155 monthly downloads. 10 of the top 10 are actively maintained.

Get all 216 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ai-evals&subcategory=uncategorized&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 DataDog/dd-trace-js

Datadog APM client for Node.js

95
Verified
2 lmnr-ai/lmnr

Laminar - open-source observability platform purpose-built for AI agents. YC S24.

88
Verified
3 mnfst/manifest

Smart LLM Routing for OpenClaw. Cut Costs up to 70% 🦞🦚

87
Verified
4 open-telemetry/opentelemetry-rust

The Rust OpenTelemetry implementation

81
Verified
5 tokio-rs/tracing

Application level tracing for Rust.

78
Verified
6 DataDog/dd-trace-go

Datadog Go Library including APM tracing, profiling, and security monitoring.

76
Verified
7 pinpoint-apm/pinpoint

APM, (Application Performance Management) tool for large-scale distributed systems.

76
Verified
8 DataDog/dd-trace-py

Datadog Python APM Client

76
Verified
9 open-telemetry/opentelemetry-go

OpenTelemetry Go API and SDK

76
Verified
10 jaegertracing/jaeger-ui

Web UI for Jaeger

76
Verified
11 DataDog/datadog-agent

Main repository for Datadog Agent

76
Verified
12 open-telemetry/opentelemetry-go-instrumentation

OpenTelemetry Auto Instrumentation using eBPF

73
Verified
13 opentracing-contrib/nginx-opentracing

NGINX plugin for OpenTracing

73
Verified
14 openzipkin/zipkin

Zipkin is a distributed tracing system

73
Verified
15 NVIDIA/garak

the LLM vulnerability scanner

72
Verified
16 winsiderss/systeminformer

A free, powerful, multi-purpose tool that helps you monitor system...

72
Verified
17 namhyung/uftrace

Function graph tracer for C/C++/Rust/Python

72
Verified
18 jaegertracing/jaeger

CNCF Jaeger, a Distributed Tracing Platform

72
Verified
19 autogluon/fev

Forecast evaluation library

72
Verified
20 confident-ai/deepeval

The LLM Evaluation Framework

71
Verified
21 inikep/lzbench

lzbench is an in-memory benchmark of open-source compressors

71
Verified
22 bpftrace/bpftrace

High-level tracing language for Linux

71
Verified
23 gofr-dev/gofr

An opinionated GoLang framework for accelerated microservice development....

70
Verified
24 SigNoz/signoz

SigNoz is an open-source observability platform native to OpenTelemetry with...

70
Verified
25 GreptimeTeam/greptimedb

The open-source Observability 2.0 database. One engine for metrics, logs,...

70
Verified
26 libbpf/libbpf

Automated upstream mirror for libbpf stand-alone build.

70
Verified
27 iipeace/guider

The All-in-One System Profiling and Fault Detection Tool for Linux & Android

70
Verified
28 pydantic/logfire

AI observability platform for production LLM and agent systems.

69
Established
29 CodSpeedHQ/pytest-codspeed

A pytest plugin to create benchmarks

69
Established
30 dotnet/BenchmarkDotNet

Powerful .NET library for benchmarking

69
Established
31 CodSpeedHQ/codspeed-rust

Crates to benchmark your Rust code

67
Established
32 alibaba/loongsuite-go-agent

OpenTelemetry Compile-Time Instrumentation for Golang

66
Established
33 coroot/coroot

Coroot is an open-source observability and APM tool with AI-powered Root...

66
Established
34 flightlessmango/MangoHud

A Vulkan and OpenGL overlay for monitoring FPS, temperatures, CPU/GPU load and more.

66
Established
35 metrico/gigapipe

⭐️ The Open-Source Polyglot Observability Warehouse: Light, Fast, Cloud...

65
Established
36 TPC-Council/HammerDB

HammerDB: The industry standard open-source database benchmark

64
Established
37 DataDog/dd-trace-java

Datadog APM client for Java

64
Established
38 DataDog/dd-trace-php

Datadog PHP Clients

64
Established
39 DataDog/dd-trace-rb

Datadog's client library for Ruby

64
Established
40 jaegertracing/helm-charts

Helm Charts for Jaeger backend

64
Established
41 DataDog/dd-sdk-ios

Datadog SDK for iOS - Swift and Objective-C.

64
Established
42 open-telemetry/opentelemetry-ruby-contrib

Contrib Packages for the OpenTelemetry Ruby API and SDK implementation.

64
Established
43 gogf/gf

A powerful framework for faster, easier, and more efficient project development.

64
Established
44 RafaelGSS/bench-node

A powerful Node.js benchmark library

64
Established
45 DataDog/dd-trace-dotnet

.NET Client Library for Datadog APM

64
Established
46 open-telemetry/opentelemetry-php

The OpenTelemetry PHP Library

64
Established
47 reframe-hpc/reframe

A powerful Python framework for writing and running portable regression...

63
Established
48 verifywise-ai/verifywise

Complete AI governance and LLM Evals platform with support for EU AI Act,...

63
Established
49 rabbitmq/rabbitmq-perf-test

A load testing tool

63
Established
50 oushujun/EDTA

Extensive de-novo TE Annotator

63
Established
51 nowsecure/fsmon

Filesystem monitor tool for Linux/Android iOS/macOS

62
Established
52 typelevel/natchez

functional tracing for cats

62
Established
53 cloudflare/ebpf_exporter

Prometheus exporter for custom eBPF metrics

62
Established
54 zio/zio-logging

Powerful logging for ZIO 2.0 applications, with compatibility with many...

62
Established
55 lttng/lttng-tools

The lttng-tools project provides a session daemon (lttng-sessiond) that acts...

62
Established
56 efficios/babeltrace

Babeltrace /ˈbæbəltreɪs/ is an open-source trace manipulation toolkit.

62
Established
57 huggingface/aisheets

Build, enrich, and transform datasets using AI models with no code

61
Established
58 typelevel/otel4s

An OpenTelemetry library for Scala based on Cats-Effect

61
Established
59 fastify/fastify-zipkin

Fastify plugin for Zipkin distributed tracing system.

61
Established
60 dash0hq/otelbin

Web-based tool to facilitate OpenTelemetry collector configuration editing...

61
Established
61 iand675/hs-opentelemetry

OpenTelemetry support for the Haskell programming language

60
Established
62 swift-otel/swift-otel

An OpenTelemetry Protocol (OTLP) backend for Swift Log, Swift Metrics, and...

60
Established
63 godotengine/godot-benchmarks

Collection of benchmarks to test performance of different areas of Godot

60
Established
64 cilium/pwru

Packet, where are you? -- eBPF-based Linux kernel networking debugger

60
Established
65 instana/go-sensor

:rocket: Go Distributed Tracing & Metrics Sensor for Instana

60
Established
66 signalfx/tracing-examples

Examples of using third-party tracers with SignalFx

59
Established
67 signalfx/splunk-otel-java

Splunk Distribution of OpenTelemetry Java

59
Established
68 instana/nodejs

Node.js in-process collectors for Instana

59
Established
69 team-decent/decent-bench

A benchmarking framework for decentralized optimization

59
Established
70 kieker-monitoring/kieker

Kieker is an observability framework, that consists of an monitoring and...

59
Established
71 jonahsnider/benchmark

A Node.js benchmarking library with support for multithreading and TurboFan...

59
Established
72 dynatrace-oss/unguard

Unguard is an insecure cloud-native microservices demo application.

59
Established
73 instana/python-sensor

:snake: Python Distributed Tracing & Metrics Sensor for Instana

58
Established
74 munich-quantum-toolkit/bench

MQT Bench - An MQT Tool for Benchmarking Quantum Software Tools

58
Established
75 ertgl/tapable-tracer

Trace the connections and flows between tapable hooks.

58
Established
76 uio-bmi/immuneML

immuneML is a platform for machine learning analysis of adaptive immune...

58
Established
77 ant-research/EasyTemporalPointProcess

EasyTPP: Towards Open Benchmarking Temporal Point Processes

57
Established
78 nhsengland/evalsense

Tools for systematic large language model evaluations

57
Established
79 instana/ruby-sensor

💎 Ruby Distributed Tracing & Metrics Sensor for Instana

56
Established
80 atesgoral/hrm-solutions

Human Resource Machine solutions and size/speed hacks

56
Established
81 bamlab/flashlight

📱⚡️ Lighthouse for Mobile - audits your app and gives a performance score to...

55
Established
82 ldbc/ldbc_snb_docs

Specification of the LDBC Social Network Benchmark suite

55
Established
83 aliesbelik/load-testing-toolkit

Collection of open-source tools for debugging, benchmarking, load and stress...

54
Established
84 unitaryfoundation/metriq-gym

metriq-gym is a framework for implementing and running standard quantum...

54
Established
85 ryncsn/memstrack

A memory allocation tracer combined with stack trace.

54
Established
86 GDATASoftwareAG/motornet

Motor.NET is a microservice framework based on Microsoft.Extensions.Hosting

54
Established
87 argonne-lcf/THAPI

A tracing infrastructure for heterogeneous computing applications.

54
Established
88 DataDog/nginx-datadog

Enhance NGINX Observability and Security with Datadog's Module

54
Established
89 bencheeorg/benchee

Easy and extensible benchmarking in Elixir providing you with lots of statistics!

54
Established
90 chirpz-ai/pandaprobe

🐼 Open source agent engineering platform: traces, evals, and metrics to...

54
Established
91 jnidzwetzki/pg-lock-tracer

An eBPF based lock tracer for PostgreSQL

54
Established
92 cau-se/theodolite

Theodolite is a framework for benchmarking the horizontal and vertical...

53
Established
93 bencherdev/bencher

🐰 Bencher - Continuous Benchmarking

53
Established
94 hendriknielaender/zBench

📊 zig benchmark

53
Established
95 DataDog/dd-trace-cpp

Datadog APM client for C++

53
Established
96 cmackenzie1/tracing-ndjson

A customizable NDJSON format for tracing in Rust

53
Established
97 prestodb/pbench

Presto/Prestissimo Benchmark Toolset

53
Established
98 elastic/elastic-otel-dotnet

Elastic OpenTelemetry .NET Distribution

53
Established
99 signalfx/splunk-otel-dotnet

Splunk Distribution of OpenTelemetry .NET

52
Established
100 FrankChen021/bithon

A full stack observability platform

52
Established
101 beling/bsuccinct-rs

Rust libraries and programs focused on succinct data structures

52
Established
102 DataDog/orchestrion

Automatic compile-time instrumentation of Go code

52
Established
103 FriendsOfOpenTelemetry/opentelemetry-bundle

Traces, metrics, and logs instrumentation within your Symfony application

52
Established
104 qwerty541/dns-bench

Find the fastest DNS in your location to improve internet browsing experience.

52
Established
105 ldcsaa/hp-soa

A fully functional, easy-to-use, and highly scalable microservice framework

51
Established
106 tlog-dev/tlog

Observability events system.

51
Established
107 ecoAPM/BenchmarkMockNet

Using BenchmarkDotNet to compare .NET mocking library performance

51
Established
108 smarr/ReBenchDB

ReBenchDB records benchmark results and provides customizable reporting to...

51
Established
109 vincentfree/opentelemetry

Open Telemetry extensions

51
Established
110 Point72/raydar

A perspective powered, user editable ray dashboard via ray serve

51
Established
111 quochuydev/dokploy-grafana-compose

Docker Compose stack for Grafana observability: Tempo traces, Loki logs,...

50
Established
112 ROCm/madengine

madengine is a streamlined CLI tool for running and benchmarking AI models...

50
Established
113 nfrankel/opentelemetry-tracing

Demo for end-to-end tracing via OpenTelemetry

50
Established
114 CodSpeedHQ/action

Github Actions for running CodSpeed in your CI

50
Established
115 kieker-monitoring/moobench

Micro-benchmarks for quantification of the performance overhead caused by...

50
Established
116 ipyflow/ipyflow

A reactive Python kernel for Jupyter notebooks.

50
Established
117 KaykCaputo/oracletrace

Lightweight Python tool to detect performance regressions and compare...

49
Emerging
118 RRZE-HPC/MachineState

This CLI tool and Python3 module collects the current system state for documentation

48
Emerging
119 dinesh-git17/claudehome

An architectural persistence experiment for large language models. Claude’s...

48
Emerging
120 ivanfioravanti/llm_context_benchmarks

📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing...

48
Emerging
121 facebookresearch/CUTracer

A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel...

48
Emerging
122 nyrkio/nyrkio

Nyrkiö is an open source platform for detecting performance changes in a...

48
Emerging
123 oteldb/oteldb

OpenTelemetry signal storage

48
Emerging
124 tw4452852/zbpf

Writing eBPF in Zig

48
Emerging
125 JDiskMark/jdm-java

Cross-platform Java Disk Benchmark Utility for measuring drive IO performance.

48
Emerging
126 lucsorel/pydoctrace

Generate architecture diagrams by tracing Python code execution

48
Emerging
127 komoju/komoju-datadog

Rust Datadog instrumentation

48
Emerging
128 mesaglio/otel-front

Lightweight OpenTelemetry viewer for local development. Single binary, no...

47
Emerging
129 Helmholtz-AI-Energy/perun

Perun is a Python package that measures the energy consumption of your applications.

47
Emerging
130 containerscrew/nflux

Simple network monitoring agent tool. Powered by eBPF & Rust 🐝

47
Emerging
131 blooop/bencher

A package for benchmarking the characteristics of arbitrary functions

46
Emerging
132 GabrielTecuceanu/httpress

a fast HTTP benchmarking tool built in Rust

46
Emerging
133 DataDog/httpd-datadog

Enhance Apache HTTPD Observability with Datadog's Module

46
Emerging
134 proactive-agent/langgraphics

Visualize live LangGraph execution and see how your agent thinks as it runs.

45
Emerging
135 CodSpeedHQ/instrument-hooks

Internal core for the codspeed instruments

45
Emerging
136 kjldev/purview-telemetry-sourcegenerator

.NET Source Generator for interface-based telemetry. Supporting activities,...

45
Emerging
137 agurinov/gopl

Golang platform library

45
Emerging
138 grafana/otel-profiling-go

Open Telemetry integration for Grafana Pyroscope and tracing solutions such...

45
Emerging
139 Spectral-Knight-Ops/local-llm-evaluator

Quickly test local LLMs with custom prompts to determine which model is best for you.

45
Emerging
140 feelpp/benchmarking

Feel++ Benchmarking

45
Emerging
141 gstinoco/mGFD

Meshless Generalized Finite Differences (mGFD) solver and reference...

44
Emerging
142 shnarazk/SAT-bench

A benchmark suit for SAT solvers

44
Emerging
143 uptrace/uptrace-ruby

OpenTelemetry Ruby distribution for Uptrace

44
Emerging
144 coralogix/coralogix-management-sdk

API clients for configuring the Coralogix platform.

44
Emerging
145 omniviser/omniray

Stop guessing! You and your AI can now see live what's happening inside your...

43
Emerging
146 HPE/torch-hammer

Torch Hammer: Strike while the GPU is hot

43
Emerging
147 typelevel/otel4s-sdk

Implementation of the otel4s SDK modules in Scala from scratch

43
Emerging
148 falcondev-oss/workflow

Simple type-safe queue worker with durable execution based on BullMQ.

42
Emerging
149 beorn/loggily

TypeScript logger with debug-style namespaces, structured JSON, and...

42
Emerging
150 givecareapp/givecare-bench

AI safety benchmark for long-term caregiving relationships. Tests crisis...

42
Emerging
151 NyanKiyoshi/pytest-django-queries

Generate performance reports from your django database performance tests.

42
Emerging
152 pgx-contrib/pgxotel

OpenTelemetry tracing instrumentation for pgx v5 — spans for queries,...

41
Emerging
153 skerkour/go-benchmarks

Comprehensive and reproducible benchmarks for Go developers and architects.

41
Emerging
154 rsasaki0109/CloudAnalyzer

CLI-first QA toolkit for point clouds, trajectories, and 3D perception...

41
Emerging
155 MrAlias/flow

An OpenTelemetry SpanProcessor reporting tracing flow metrics

41
Emerging
156 udhos/opentelemetry-trace-sqs

opentelemetry-trace-sqs propagates Open Telemetry tracing with SQS messages...

41
Emerging
157 jamesgober/metrics-lib

The fastest metrics library for Rust. Lock-free 0.6ns gauges, 18ns counters,...

41
Emerging
158 smyrgeorge/log4k

A Comprehensive Logging and Tracing Solution for Kotlin Multiplatform.

40
Emerging
159 KempnerInstitute/nvidia-hpc-benchmarks

NVIDIA HPC Benchmarks

40
Emerging
160 meshkovQA/Eval-ai-library

Comprehensive AI Model Evaluation Framework with advanced techniques...

39
Emerging
161 getaxonflow/axonflow

AxonFlow: Runtime control layer for production AI

39
Emerging
162 IBM/OpenDsStar

OpenDsStar is an open-source implementation of the DS-Star agent that...

38
Emerging
163 kobsio/kobs

Kubernetes Observability Platform

37
Emerging
164 hdmsantander/microservices-ops-demo

Spring Boot demo for observability, traceability and error analysis in a...

37
Emerging
165 mbzuai-oryx/Agent-X

ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric...

37
Emerging
166 evaluation-context-protocol/ecp

ECP is a standardized interface for orchestrating, auditing, and enforcing...

37
Emerging
167 verifywise-ai/plugin-marketplace

VerifyWise AI Governance Plugin Marketplace

36
Emerging
168 braintrustdata/braintrust-pi-extension

Braintrust tracing plugin for pi

36
Emerging
169 nixel2007/opentelemetry

OpenTelemetry SDK для OneScript

36
Emerging
170 everythings-gonna-be-alright/phpScope

PHP profiler that sends CPU sampling data to Pyroscope server.

36
Emerging
171 opsrobot-ai/opsrobot

Observability platform for OpenClaw agents, providing real-time tracing,...

35
Emerging
172 kolloch/reqray

Log call tree summaries after each request for rust programs instrumented...

35
Emerging
173 tracewayapp/opentelemetry-symfony-bundle

Pure-PHP OpenTelemetry instrumentation for Symfony - automatic HTTP,...

35
Emerging
174 PacificBiosciences/aardvark

A tool for sniffing out the differences in vari-Ants

35
Emerging
175 yonatan-h/express-k6-profiler

Finds bottlenecks in an Express app during load testing

34
Emerging
176 cuihairu/croupier

Croupier is a universal GM (Game Master) backend system designed for game...

34
Emerging
177 aykhans/sarin

A high-performance HTTP load testing tool. Features dynamic request...

33
Emerging
178 dolmen-go/flagx

Extensions for the Go 'flag' package: flagx, flagfile, flagnet, flagtrace

32
Emerging
179 MrAlias/collex

Use OpenTelemetry Collector Factories to Export with OpenTelemetry Go

32
Emerging
180 rodneylab/axum-graphql

Rust GraphQL demo/test API written in Rust, using Axum for routing,...

31
Emerging
181 AmalChandru/termtrace

A terminal workflow recorder that turns debugging sessions into replayable,...

31
Emerging
182 last9/opentelemetry-examples

Production-ready OpenTelemetry instrumentation examples for Go, Python,...

31
Emerging
183 PAIR-Systems-Inc/little-dorrit-editor

Multimodal benchmark for evaluating handwritten editorial correction in printed text.

31
Emerging
184 filipsPL/optuml

Optuna-optimized ML methods, with scikit-learn like API

31
Emerging
185 BudEcosystem/bud-runtime

Bud AI Foundry - A comprehensive inference stack for compound AI deployment,...

31
Emerging
186 russfellows/sai3-bench

A multi-protocol storage performance testing tool, inspired by vdbench, fio...

30
Emerging
187 hboublal/dopGuard

Modular observability platform for .NET applications, integrating with tools...

30
Emerging
188 imadAttar/spring-boot-unified-observability-starter

All-in-one Spring Boot Starter for Observability: Metrics, Traces, Logs, and...

30
Emerging
189 nshkrdotcom/AITrace

The unified observability layer for the AI Control Plane

30
Emerging
190 qcmet/qcmet

Quantum Computing Metrics and Benchmarks

30
Emerging
191 tolitius/cupel

discover LLMs punching above their weight

29
Experimental
192 wangyz1999/sync-video-label

A web-based annotation tool for synchronized multi-video timeline labeling...

29
Experimental
193 iRevive/fs2-grpc-otel4s

otel4s instrumentation for fs2-grpc

28
Experimental
194 mnemom/mnemom-platform

Safe House for AI agents — transparent gateway with inbound + outbound...

28
Experimental
195 rvnhq/raven

A lightweight, self-hostable cloud infrastructure monitoring and telemetry platform.

28
Experimental
196 DaSH-Lab-CSIS/blossom

BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed...

27
Experimental
197 kyahikaru/llm-guardrail-red-teaming

Protocol constrained red teaming of frontier LLM guardrails in high risk...

27
Experimental
198 last9/rails-otel-context

Tells you which code fired that query. Zero config.

27
Experimental
199 thanhdaon/clean-arch-go

Clean Architecture, DDD, CQRS with testings in Go

27
Experimental
200 LLMSystems/BehaviorRL-Hallucination

Learning When to Answer: Behavior-Oriented Reinforcement Learning for...

26
Experimental
201 maxi4youuu/RePRo

🧠 Enhance raw prompts into optimized, powerful versions for AI tools like...

26
Experimental
202 Anarv2104/Inflion

Observability and influence tracing infrastructure for multi-agent AI systems.

26
Experimental
203 HiThink-Research/FinMTM

[ACL 2026] FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning...

25
Experimental
204 fourdollars/cella

A terminal UI and CLI for managing and monitoring LXD + Docker containers —...

25
Experimental
205 FelixBroesamle/s2mflow

Meta-generator: generating multicommodity flow instances from...

24
Experimental
206 iazaran/trace-replay

High-fidelity process tracking, deterministic replay, and AI-powered...

24
Experimental
207 Basaltlabs-app/Gauntlet

Community-driven behavioral reliability benchmark for LLMs. 88 probes across...

24
Experimental
208 SagarMaheshwary/reqlog

Fast CLI to search and trace logs across services or single files using...

24
Experimental
209 TomasVenkrbec/lazyline

Zero-config line-level Python profiler. No decorators, no code changes....

24
Experimental
210 0xMilord/better-logger

Execution flow debugger for modern apps. Turn scattered `console.log` calls...

24
Experimental
211 vikpant/strategic-coopetition

Coopetition-Gym: A research-grade mixed-motive multi-agent reinforcement...

23
Experimental
212 bajajku/VAC

Develop and evaluate a trauma-informed LLM-based chatbot that is...

22
Experimental
213 parsamivehchi/tps.sh

tps.sh — Tokens Per Second LLM Benchmark. 7 models, 147 tests, 21 prompts...

18
Experimental
214 Zxela/claude-monitor

Real-time dashboard for monitoring Claude Code sessions — live token usage,...

16
Experimental
215 pilhuhn/otel-oql

An experiment in creating a OpenTelemetry backend

16
Experimental
216 MarkIvor/officeiq

Исследовательский вопрос: можно ли измерить «офисный интеллект» LLM? Попытка...

15
Experimental