LLM Observability Platforms Prompt Engineering Tools

Tools for monitoring, tracing, evaluating, and debugging LLM applications in production. Includes end-to-end observability, real-time metrics, automated evals, and prompt management dashboards. Does NOT include general application monitoring, synthetic data generation, or agent training frameworks.

There are 27 llm observability platforms tools tracked. 3 score above 70 (verified tier). The highest-rated is langfuse/langfuse at 82/100 with 23,106 stars. 6 of the top 10 are actively maintained.

Get all 27 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=prompt-engineering&subcategory=llm-observability-platforms&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals,...

82
Verified
2 Arize-ai/phoenix

AI Observability & Evaluation

81
Verified
3 Mirascope/mirascope

The LLM Anti-Framework

74
Verified
4 Agenta-AI/agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM...

69
Established
5 Helicone/helicone

🧊 Open source LLM observability platform. One line of code to monitor,...

68
Established
6 algorithmicsuperintelligence/optillm

Optimizing inference proxy for LLMs

62
Established
7 TensorOpsAI/LLMstudio

Framework to bring LLM applications to production

61
Established
8 Scale3-Labs/langtrace

Langtrace 🔍 is an open-source, Open Telemetry based end-to-end...

51
Established
9 langfuse/langfuse-java

🪢 Auto-generated Java Client for Langfuse API

49
Emerging
10 AnchoringAI/anchoring-ai

An open-source no-code tool for teams to collaborate on building,...

46
Emerging
11 whylabs/langkit

🔍 LangKit: An open-source toolkit for monitoring Large Language Models...

43
Emerging
12 TrentPierce/PolyCouncil

PolyCouncil is an open-source multi-model deliberation engine for LM Studio....

40
Emerging
13 tenemos/langwatch

The open LLM Ops platform - Traces, Analytics, Evaluations, Datasets and...

39
Emerging
14 brokle-ai/brokle

The AI engineering platform for AI teams. Observability, evaluation, and...

37
Emerging
15 ksm26/Evaluating-AI-Agents

A hands-on course repository for Evaluating AI Agents, created with Arize...

23
Experimental
16 chirindaopensource/multi_agent_system_architecture_for_federal_funds_target_rate_prediction

End-to-End Python implementation of "FedSight AI" multi-agent system for...

22
Experimental
17 MagicTeaMC/dnsLM

dnsLM: Where AI meets DNS—because even domains deserve a little intelligence!

19
Experimental
18 rahatmoktadir03/llm-evaluation-platform

A full-stack web application for comparing and analyzing the performance of...

17
Experimental
19 promplate/trace

integrate with @langfuse or langsmith - plug-and-play observability for @promplate

15
Experimental
20 Uplay111/Loki-s-Insight-

A lightweight visual dashboard to inspect and edit OpenClaw AI agent memory...

14
Experimental
21 Tarunjit45/ModelPulse

ModelPulse helps maintain model reliability and performance by providing...

14
Experimental
22 VicRejkia/LLM-Sherpa

A Python GUI tool to package a codebase into a single, context-rich Markdown...

13
Experimental
23 alhemdrew/self-hosted-llm-infrastructure

Deployment of a self-hosted LLM infrastructure using Ollama and Open WebUI...

13
Experimental
24 marco-ruiz/llm-repo

Framework that translates LLM responses to structured data models

13
Experimental
25 vshwsh/prod-evals-cookbook

🎯 Build effective AI evaluations through a hands-on tutorial, using a...

13
Experimental
26 airfold/airlang

⚡ From Zero to Monitoring LLMs in 5 minutes ⚡

12
Experimental
27 tooniez/llm-toolkit

🛠️ A collection of prompts, tools and functions to provide researchers with...

11
Experimental