Uncategorized AI Evaluation Tools
There are 216 uncategorized tools tracked. 27 score above 70 (verified tier). The highest-rated is DataDog/dd-trace-js at 95/100 with 790 stars and 26,477,155 monthly downloads. 10 of the top 10 are actively maintained.
Get all 216 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ai-evals&subcategory=uncategorized&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
DataDog/dd-trace-js
Datadog APM client for Node.js |
|
Verified |
| 2 |
lmnr-ai/lmnr
Laminar - open-source observability platform purpose-built for AI agents. YC S24. |
|
Verified |
| 3 |
mnfst/manifest
Smart LLM Routing for OpenClaw. Cut Costs up to 70% 🦞🦚 |
|
Verified |
| 4 |
open-telemetry/opentelemetry-rust
The Rust OpenTelemetry implementation |
|
Verified |
| 5 |
tokio-rs/tracing
Application level tracing for Rust. |
|
Verified |
| 6 |
DataDog/dd-trace-go
Datadog Go Library including APM tracing, profiling, and security monitoring. |
|
Verified |
| 7 |
pinpoint-apm/pinpoint
APM, (Application Performance Management) tool for large-scale distributed systems. |
|
Verified |
| 8 |
DataDog/dd-trace-py
Datadog Python APM Client |
|
Verified |
| 9 |
open-telemetry/opentelemetry-go
OpenTelemetry Go API and SDK |
|
Verified |
| 10 |
jaegertracing/jaeger-ui
Web UI for Jaeger |
|
Verified |
| 11 |
DataDog/datadog-agent
Main repository for Datadog Agent |
|
Verified |
| 12 |
open-telemetry/opentelemetry-go-instrumentation
OpenTelemetry Auto Instrumentation using eBPF |
|
Verified |
| 13 |
opentracing-contrib/nginx-opentracing
NGINX plugin for OpenTracing |
|
Verified |
| 14 |
openzipkin/zipkin
Zipkin is a distributed tracing system |
|
Verified |
| 15 |
NVIDIA/garak
the LLM vulnerability scanner |
|
Verified |
| 16 |
winsiderss/systeminformer
A free, powerful, multi-purpose tool that helps you monitor system... |
|
Verified |
| 17 |
namhyung/uftrace
Function graph tracer for C/C++/Rust/Python |
|
Verified |
| 18 |
jaegertracing/jaeger
CNCF Jaeger, a Distributed Tracing Platform |
|
Verified |
| 19 |
autogluon/fev
Forecast evaluation library |
|
Verified |
| 20 |
confident-ai/deepeval
The LLM Evaluation Framework |
|
Verified |
| 21 |
inikep/lzbench
lzbench is an in-memory benchmark of open-source compressors |
|
Verified |
| 22 |
bpftrace/bpftrace
High-level tracing language for Linux |
|
Verified |
| 23 |
gofr-dev/gofr
An opinionated GoLang framework for accelerated microservice development.... |
|
Verified |
| 24 |
SigNoz/signoz
SigNoz is an open-source observability platform native to OpenTelemetry with... |
|
Verified |
| 25 |
GreptimeTeam/greptimedb
The open-source Observability 2.0 database. One engine for metrics, logs,... |
|
Verified |
| 26 |
libbpf/libbpf
Automated upstream mirror for libbpf stand-alone build. |
|
Verified |
| 27 |
iipeace/guider
The All-in-One System Profiling and Fault Detection Tool for Linux & Android |
|
Verified |
| 28 |
pydantic/logfire
AI observability platform for production LLM and agent systems. |
|
Established |
| 29 |
CodSpeedHQ/pytest-codspeed
A pytest plugin to create benchmarks |
|
Established |
| 30 |
dotnet/BenchmarkDotNet
Powerful .NET library for benchmarking |
|
Established |
| 31 |
CodSpeedHQ/codspeed-rust
Crates to benchmark your Rust code |
|
Established |
| 32 |
alibaba/loongsuite-go-agent
OpenTelemetry Compile-Time Instrumentation for Golang |
|
Established |
| 33 |
coroot/coroot
Coroot is an open-source observability and APM tool with AI-powered Root... |
|
Established |
| 34 |
flightlessmango/MangoHud
A Vulkan and OpenGL overlay for monitoring FPS, temperatures, CPU/GPU load and more. |
|
Established |
| 35 |
metrico/gigapipe
⭐️ The Open-Source Polyglot Observability Warehouse: Light, Fast, Cloud... |
|
Established |
| 36 |
TPC-Council/HammerDB
HammerDB: The industry standard open-source database benchmark |
|
Established |
| 37 |
DataDog/dd-trace-java
Datadog APM client for Java |
|
Established |
| 38 |
DataDog/dd-trace-php
Datadog PHP Clients |
|
Established |
| 39 |
DataDog/dd-trace-rb
Datadog's client library for Ruby |
|
Established |
| 40 |
jaegertracing/helm-charts
Helm Charts for Jaeger backend |
|
Established |
| 41 |
DataDog/dd-sdk-ios
Datadog SDK for iOS - Swift and Objective-C. |
|
Established |
| 42 |
open-telemetry/opentelemetry-ruby-contrib
Contrib Packages for the OpenTelemetry Ruby API and SDK implementation. |
|
Established |
| 43 |
gogf/gf
A powerful framework for faster, easier, and more efficient project development. |
|
Established |
| 44 |
RafaelGSS/bench-node
A powerful Node.js benchmark library |
|
Established |
| 45 |
DataDog/dd-trace-dotnet
.NET Client Library for Datadog APM |
|
Established |
| 46 |
open-telemetry/opentelemetry-php
The OpenTelemetry PHP Library |
|
Established |
| 47 |
reframe-hpc/reframe
A powerful Python framework for writing and running portable regression... |
|
Established |
| 48 |
verifywise-ai/verifywise
Complete AI governance and LLM Evals platform with support for EU AI Act,... |
|
Established |
| 49 |
rabbitmq/rabbitmq-perf-test
A load testing tool |
|
Established |
| 50 |
oushujun/EDTA
Extensive de-novo TE Annotator |
|
Established |
| 51 |
nowsecure/fsmon
Filesystem monitor tool for Linux/Android iOS/macOS |
|
Established |
| 52 |
typelevel/natchez
functional tracing for cats |
|
Established |
| 53 |
cloudflare/ebpf_exporter
Prometheus exporter for custom eBPF metrics |
|
Established |
| 54 |
zio/zio-logging
Powerful logging for ZIO 2.0 applications, with compatibility with many... |
|
Established |
| 55 |
lttng/lttng-tools
The lttng-tools project provides a session daemon (lttng-sessiond) that acts... |
|
Established |
| 56 |
efficios/babeltrace
Babeltrace /ˈbæbəltreɪs/ is an open-source trace manipulation toolkit. |
|
Established |
| 57 |
huggingface/aisheets
Build, enrich, and transform datasets using AI models with no code |
|
Established |
| 58 |
typelevel/otel4s
An OpenTelemetry library for Scala based on Cats-Effect |
|
Established |
| 59 |
fastify/fastify-zipkin
Fastify plugin for Zipkin distributed tracing system. |
|
Established |
| 60 |
dash0hq/otelbin
Web-based tool to facilitate OpenTelemetry collector configuration editing... |
|
Established |
| 61 |
iand675/hs-opentelemetry
OpenTelemetry support for the Haskell programming language |
|
Established |
| 62 |
swift-otel/swift-otel
An OpenTelemetry Protocol (OTLP) backend for Swift Log, Swift Metrics, and... |
|
Established |
| 63 |
godotengine/godot-benchmarks
Collection of benchmarks to test performance of different areas of Godot |
|
Established |
| 64 |
cilium/pwru
Packet, where are you? -- eBPF-based Linux kernel networking debugger |
|
Established |
| 65 |
instana/go-sensor
:rocket: Go Distributed Tracing & Metrics Sensor for Instana |
|
Established |
| 66 |
signalfx/tracing-examples
Examples of using third-party tracers with SignalFx |
|
Established |
| 67 |
signalfx/splunk-otel-java
Splunk Distribution of OpenTelemetry Java |
|
Established |
| 68 |
instana/nodejs
Node.js in-process collectors for Instana |
|
Established |
| 69 |
team-decent/decent-bench
A benchmarking framework for decentralized optimization |
|
Established |
| 70 |
kieker-monitoring/kieker
Kieker is an observability framework, that consists of an monitoring and... |
|
Established |
| 71 |
jonahsnider/benchmark
A Node.js benchmarking library with support for multithreading and TurboFan... |
|
Established |
| 72 |
dynatrace-oss/unguard
Unguard is an insecure cloud-native microservices demo application. |
|
Established |
| 73 |
instana/python-sensor
:snake: Python Distributed Tracing & Metrics Sensor for Instana |
|
Established |
| 74 |
munich-quantum-toolkit/bench
MQT Bench - An MQT Tool for Benchmarking Quantum Software Tools |
|
Established |
| 75 |
ertgl/tapable-tracer
Trace the connections and flows between tapable hooks. |
|
Established |
| 76 |
uio-bmi/immuneML
immuneML is a platform for machine learning analysis of adaptive immune... |
|
Established |
| 77 |
ant-research/EasyTemporalPointProcess
EasyTPP: Towards Open Benchmarking Temporal Point Processes |
|
Established |
| 78 |
nhsengland/evalsense
Tools for systematic large language model evaluations |
|
Established |
| 79 |
instana/ruby-sensor
💎 Ruby Distributed Tracing & Metrics Sensor for Instana |
|
Established |
| 80 |
atesgoral/hrm-solutions
Human Resource Machine solutions and size/speed hacks |
|
Established |
| 81 |
bamlab/flashlight
📱⚡️ Lighthouse for Mobile - audits your app and gives a performance score to... |
|
Established |
| 82 |
ldbc/ldbc_snb_docs
Specification of the LDBC Social Network Benchmark suite |
|
Established |
| 83 |
aliesbelik/load-testing-toolkit
Collection of open-source tools for debugging, benchmarking, load and stress... |
|
Established |
| 84 |
unitaryfoundation/metriq-gym
metriq-gym is a framework for implementing and running standard quantum... |
|
Established |
| 85 |
ryncsn/memstrack
A memory allocation tracer combined with stack trace. |
|
Established |
| 86 |
GDATASoftwareAG/motornet
Motor.NET is a microservice framework based on Microsoft.Extensions.Hosting |
|
Established |
| 87 |
argonne-lcf/THAPI
A tracing infrastructure for heterogeneous computing applications. |
|
Established |
| 88 |
DataDog/nginx-datadog
Enhance NGINX Observability and Security with Datadog's Module |
|
Established |
| 89 |
bencheeorg/benchee
Easy and extensible benchmarking in Elixir providing you with lots of statistics! |
|
Established |
| 90 |
chirpz-ai/pandaprobe
🐼 Open source agent engineering platform: traces, evals, and metrics to... |
|
Established |
| 91 |
jnidzwetzki/pg-lock-tracer
An eBPF based lock tracer for PostgreSQL |
|
Established |
| 92 |
cau-se/theodolite
Theodolite is a framework for benchmarking the horizontal and vertical... |
|
Established |
| 93 |
bencherdev/bencher
🐰 Bencher - Continuous Benchmarking |
|
Established |
| 94 |
hendriknielaender/zBench
📊 zig benchmark |
|
Established |
| 95 |
DataDog/dd-trace-cpp
Datadog APM client for C++ |
|
Established |
| 96 |
cmackenzie1/tracing-ndjson
A customizable NDJSON format for tracing in Rust |
|
Established |
| 97 |
prestodb/pbench
Presto/Prestissimo Benchmark Toolset |
|
Established |
| 98 |
elastic/elastic-otel-dotnet
Elastic OpenTelemetry .NET Distribution |
|
Established |
| 99 |
signalfx/splunk-otel-dotnet
Splunk Distribution of OpenTelemetry .NET |
|
Established |
| 100 |
FrankChen021/bithon
A full stack observability platform |
|
Established |
| 101 |
beling/bsuccinct-rs
Rust libraries and programs focused on succinct data structures |
|
Established |
| 102 |
DataDog/orchestrion
Automatic compile-time instrumentation of Go code |
|
Established |
| 103 |
FriendsOfOpenTelemetry/opentelemetry-bundle
Traces, metrics, and logs instrumentation within your Symfony application |
|
Established |
| 104 |
qwerty541/dns-bench
Find the fastest DNS in your location to improve internet browsing experience. |
|
Established |
| 105 |
ldcsaa/hp-soa
A fully functional, easy-to-use, and highly scalable microservice framework |
|
Established |
| 106 |
tlog-dev/tlog
Observability events system. |
|
Established |
| 107 |
ecoAPM/BenchmarkMockNet
Using BenchmarkDotNet to compare .NET mocking library performance |
|
Established |
| 108 |
smarr/ReBenchDB
ReBenchDB records benchmark results and provides customizable reporting to... |
|
Established |
| 109 |
vincentfree/opentelemetry
Open Telemetry extensions |
|
Established |
| 110 |
Point72/raydar
A perspective powered, user editable ray dashboard via ray serve |
|
Established |
| 111 |
quochuydev/dokploy-grafana-compose
Docker Compose stack for Grafana observability: Tempo traces, Loki logs,... |
|
Established |
| 112 |
ROCm/madengine
madengine is a streamlined CLI tool for running and benchmarking AI models... |
|
Established |
| 113 |
nfrankel/opentelemetry-tracing
Demo for end-to-end tracing via OpenTelemetry |
|
Established |
| 114 |
CodSpeedHQ/action
Github Actions for running CodSpeed in your CI |
|
Established |
| 115 |
kieker-monitoring/moobench
Micro-benchmarks for quantification of the performance overhead caused by... |
|
Established |
| 116 |
ipyflow/ipyflow
A reactive Python kernel for Jupyter notebooks. |
|
Established |
| 117 |
KaykCaputo/oracletrace
Lightweight Python tool to detect performance regressions and compare... |
|
Emerging |
| 118 |
RRZE-HPC/MachineState
This CLI tool and Python3 module collects the current system state for documentation |
|
Emerging |
| 119 |
dinesh-git17/claudehome
An architectural persistence experiment for large language models. Claude’s... |
|
Emerging |
| 120 |
ivanfioravanti/llm_context_benchmarks
📊 LLM Context Benchmarks - A comprehensive benchmarking tool for testing... |
|
Emerging |
| 121 |
facebookresearch/CUTracer
A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel... |
|
Emerging |
| 122 |
nyrkio/nyrkio
Nyrkiö is an open source platform for detecting performance changes in a... |
|
Emerging |
| 123 |
oteldb/oteldb
OpenTelemetry signal storage |
|
Emerging |
| 124 |
tw4452852/zbpf
Writing eBPF in Zig |
|
Emerging |
| 125 |
JDiskMark/jdm-java
Cross-platform Java Disk Benchmark Utility for measuring drive IO performance. |
|
Emerging |
| 126 |
lucsorel/pydoctrace
Generate architecture diagrams by tracing Python code execution |
|
Emerging |
| 127 |
komoju/komoju-datadog
Rust Datadog instrumentation |
|
Emerging |
| 128 |
mesaglio/otel-front
Lightweight OpenTelemetry viewer for local development. Single binary, no... |
|
Emerging |
| 129 |
Helmholtz-AI-Energy/perun
Perun is a Python package that measures the energy consumption of your applications. |
|
Emerging |
| 130 |
containerscrew/nflux
Simple network monitoring agent tool. Powered by eBPF & Rust 🐝 |
|
Emerging |
| 131 |
blooop/bencher
A package for benchmarking the characteristics of arbitrary functions |
|
Emerging |
| 132 |
GabrielTecuceanu/httpress
a fast HTTP benchmarking tool built in Rust |
|
Emerging |
| 133 |
DataDog/httpd-datadog
Enhance Apache HTTPD Observability with Datadog's Module |
|
Emerging |
| 134 |
proactive-agent/langgraphics
Visualize live LangGraph execution and see how your agent thinks as it runs. |
|
Emerging |
| 135 |
CodSpeedHQ/instrument-hooks
Internal core for the codspeed instruments |
|
Emerging |
| 136 |
kjldev/purview-telemetry-sourcegenerator
.NET Source Generator for interface-based telemetry. Supporting activities,... |
|
Emerging |
| 137 |
agurinov/gopl
Golang platform library |
|
Emerging |
| 138 |
grafana/otel-profiling-go
Open Telemetry integration for Grafana Pyroscope and tracing solutions such... |
|
Emerging |
| 139 |
Spectral-Knight-Ops/local-llm-evaluator
Quickly test local LLMs with custom prompts to determine which model is best for you. |
|
Emerging |
| 140 |
feelpp/benchmarking
Feel++ Benchmarking |
|
Emerging |
| 141 |
gstinoco/mGFD
Meshless Generalized Finite Differences (mGFD) solver and reference... |
|
Emerging |
| 142 |
shnarazk/SAT-bench
A benchmark suit for SAT solvers |
|
Emerging |
| 143 |
uptrace/uptrace-ruby
OpenTelemetry Ruby distribution for Uptrace |
|
Emerging |
| 144 |
coralogix/coralogix-management-sdk
API clients for configuring the Coralogix platform. |
|
Emerging |
| 145 |
omniviser/omniray
Stop guessing! You and your AI can now see live what's happening inside your... |
|
Emerging |
| 146 |
HPE/torch-hammer
Torch Hammer: Strike while the GPU is hot |
|
Emerging |
| 147 |
typelevel/otel4s-sdk
Implementation of the otel4s SDK modules in Scala from scratch |
|
Emerging |
| 148 |
falcondev-oss/workflow
Simple type-safe queue worker with durable execution based on BullMQ. |
|
Emerging |
| 149 |
beorn/loggily
TypeScript logger with debug-style namespaces, structured JSON, and... |
|
Emerging |
| 150 |
givecareapp/givecare-bench
AI safety benchmark for long-term caregiving relationships. Tests crisis... |
|
Emerging |
| 151 |
NyanKiyoshi/pytest-django-queries
Generate performance reports from your django database performance tests. |
|
Emerging |
| 152 |
pgx-contrib/pgxotel
OpenTelemetry tracing instrumentation for pgx v5 — spans for queries,... |
|
Emerging |
| 153 |
skerkour/go-benchmarks
Comprehensive and reproducible benchmarks for Go developers and architects. |
|
Emerging |
| 154 |
rsasaki0109/CloudAnalyzer
CLI-first QA toolkit for point clouds, trajectories, and 3D perception... |
|
Emerging |
| 155 |
MrAlias/flow
An OpenTelemetry SpanProcessor reporting tracing flow metrics |
|
Emerging |
| 156 |
udhos/opentelemetry-trace-sqs
opentelemetry-trace-sqs propagates Open Telemetry tracing with SQS messages... |
|
Emerging |
| 157 |
jamesgober/metrics-lib
The fastest metrics library for Rust. Lock-free 0.6ns gauges, 18ns counters,... |
|
Emerging |
| 158 |
smyrgeorge/log4k
A Comprehensive Logging and Tracing Solution for Kotlin Multiplatform. |
|
Emerging |
| 159 |
KempnerInstitute/nvidia-hpc-benchmarks
NVIDIA HPC Benchmarks |
|
Emerging |
| 160 |
meshkovQA/Eval-ai-library
Comprehensive AI Model Evaluation Framework with advanced techniques... |
|
Emerging |
| 161 |
getaxonflow/axonflow
AxonFlow: Runtime control layer for production AI |
|
Emerging |
| 162 |
IBM/OpenDsStar
OpenDsStar is an open-source implementation of the DS-Star agent that... |
|
Emerging |
| 163 |
kobsio/kobs
Kubernetes Observability Platform |
|
Emerging |
| 164 |
hdmsantander/microservices-ops-demo
Spring Boot demo for observability, traceability and error analysis in a... |
|
Emerging |
| 165 |
mbzuai-oryx/Agent-X
ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric... |
|
Emerging |
| 166 |
evaluation-context-protocol/ecp
ECP is a standardized interface for orchestrating, auditing, and enforcing... |
|
Emerging |
| 167 |
verifywise-ai/plugin-marketplace
VerifyWise AI Governance Plugin Marketplace |
|
Emerging |
| 168 |
braintrustdata/braintrust-pi-extension
Braintrust tracing plugin for pi |
|
Emerging |
| 169 |
nixel2007/opentelemetry
OpenTelemetry SDK для OneScript |
|
Emerging |
| 170 |
everythings-gonna-be-alright/phpScope
PHP profiler that sends CPU sampling data to Pyroscope server. |
|
Emerging |
| 171 |
opsrobot-ai/opsrobot
Observability platform for OpenClaw agents, providing real-time tracing,... |
|
Emerging |
| 172 |
kolloch/reqray
Log call tree summaries after each request for rust programs instrumented... |
|
Emerging |
| 173 |
tracewayapp/opentelemetry-symfony-bundle
Pure-PHP OpenTelemetry instrumentation for Symfony - automatic HTTP,... |
|
Emerging |
| 174 |
PacificBiosciences/aardvark
A tool for sniffing out the differences in vari-Ants |
|
Emerging |
| 175 |
yonatan-h/express-k6-profiler
Finds bottlenecks in an Express app during load testing |
|
Emerging |
| 176 |
cuihairu/croupier
Croupier is a universal GM (Game Master) backend system designed for game... |
|
Emerging |
| 177 |
aykhans/sarin
A high-performance HTTP load testing tool. Features dynamic request... |
|
Emerging |
| 178 |
dolmen-go/flagx
Extensions for the Go 'flag' package: flagx, flagfile, flagnet, flagtrace |
|
Emerging |
| 179 |
MrAlias/collex
Use OpenTelemetry Collector Factories to Export with OpenTelemetry Go |
|
Emerging |
| 180 |
rodneylab/axum-graphql
Rust GraphQL demo/test API written in Rust, using Axum for routing,... |
|
Emerging |
| 181 |
AmalChandru/termtrace
A terminal workflow recorder that turns debugging sessions into replayable,... |
|
Emerging |
| 182 |
last9/opentelemetry-examples
Production-ready OpenTelemetry instrumentation examples for Go, Python,... |
|
Emerging |
| 183 |
PAIR-Systems-Inc/little-dorrit-editor
Multimodal benchmark for evaluating handwritten editorial correction in printed text. |
|
Emerging |
| 184 |
filipsPL/optuml
Optuna-optimized ML methods, with scikit-learn like API |
|
Emerging |
| 185 |
BudEcosystem/bud-runtime
Bud AI Foundry - A comprehensive inference stack for compound AI deployment,... |
|
Emerging |
| 186 |
russfellows/sai3-bench
A multi-protocol storage performance testing tool, inspired by vdbench, fio... |
|
Emerging |
| 187 |
hboublal/dopGuard
Modular observability platform for .NET applications, integrating with tools... |
|
Emerging |
| 188 |
imadAttar/spring-boot-unified-observability-starter
All-in-one Spring Boot Starter for Observability: Metrics, Traces, Logs, and... |
|
Emerging |
| 189 |
nshkrdotcom/AITrace
The unified observability layer for the AI Control Plane |
|
Emerging |
| 190 |
qcmet/qcmet
Quantum Computing Metrics and Benchmarks |
|
Emerging |
| 191 |
tolitius/cupel
discover LLMs punching above their weight |
|
Experimental |
| 192 |
wangyz1999/sync-video-label
A web-based annotation tool for synchronized multi-video timeline labeling... |
|
Experimental |
| 193 |
iRevive/fs2-grpc-otel4s
otel4s instrumentation for fs2-grpc |
|
Experimental |
| 194 |
mnemom/mnemom-platform
Safe House for AI agents — transparent gateway with inbound + outbound... |
|
Experimental |
| 195 |
rvnhq/raven
A lightweight, self-hostable cloud infrastructure monitoring and telemetry platform. |
|
Experimental |
| 196 |
DaSH-Lab-CSIS/blossom
BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed... |
|
Experimental |
| 197 |
kyahikaru/llm-guardrail-red-teaming
Protocol constrained red teaming of frontier LLM guardrails in high risk... |
|
Experimental |
| 198 |
last9/rails-otel-context
Tells you which code fired that query. Zero config. |
|
Experimental |
| 199 |
thanhdaon/clean-arch-go
Clean Architecture, DDD, CQRS with testings in Go |
|
Experimental |
| 200 |
LLMSystems/BehaviorRL-Hallucination
Learning When to Answer: Behavior-Oriented Reinforcement Learning for... |
|
Experimental |
| 201 |
maxi4youuu/RePRo
🧠 Enhance raw prompts into optimized, powerful versions for AI tools like... |
|
Experimental |
| 202 |
Anarv2104/Inflion
Observability and influence tracing infrastructure for multi-agent AI systems. |
|
Experimental |
| 203 |
HiThink-Research/FinMTM
[ACL 2026] FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning... |
|
Experimental |
| 204 |
fourdollars/cella
A terminal UI and CLI for managing and monitoring LXD + Docker containers —... |
|
Experimental |
| 205 |
FelixBroesamle/s2mflow
Meta-generator: generating multicommodity flow instances from... |
|
Experimental |
| 206 |
iazaran/trace-replay
High-fidelity process tracking, deterministic replay, and AI-powered... |
|
Experimental |
| 207 |
Basaltlabs-app/Gauntlet
Community-driven behavioral reliability benchmark for LLMs. 88 probes across... |
|
Experimental |
| 208 |
SagarMaheshwary/reqlog
Fast CLI to search and trace logs across services or single files using... |
|
Experimental |
| 209 |
TomasVenkrbec/lazyline
Zero-config line-level Python profiler. No decorators, no code changes.... |
|
Experimental |
| 210 |
0xMilord/better-logger
Execution flow debugger for modern apps. Turn scattered `console.log` calls... |
|
Experimental |
| 211 |
vikpant/strategic-coopetition
Coopetition-Gym: A research-grade mixed-motive multi-agent reinforcement... |
|
Experimental |
| 212 |
bajajku/VAC
Develop and evaluate a trauma-informed LLM-based chatbot that is... |
|
Experimental |
| 213 |
parsamivehchi/tps.sh
tps.sh — Tokens Per Second LLM Benchmark. 7 models, 147 tests, 21 prompts... |
|
Experimental |
| 214 |
Zxela/claude-monitor
Real-time dashboard for monitoring Claude Code sessions — live token usage,... |
|
Experimental |
| 215 |
pilhuhn/otel-oql
An experiment in creating a OpenTelemetry backend |
|
Experimental |
| 216 |
MarkIvor/officeiq
Исследовательский вопрос: можно ли измерить «офисный интеллект» LLM? Попытка... |
|
Experimental |