truera/trulens

Evaluation and Tracking for LLM Experiments and AI Agents

/ 100

Verified

This tool helps AI engineers and developers systematically evaluate and track their Large Language Model (LLM) application experiments. It takes your LLM application's prompts, models, retrievers, and knowledge sources as input, and provides detailed feedback and performance insights to help you identify failure modes. The output enables you to understand and improve your application's behavior and performance.

3,160 stars. Actively maintained with 9 commits in the last 30 days. Available on PyPI.

Use this if you are building or iterating on an LLM-powered application and need a structured way to test, compare, and improve different versions of your app.

Not ideal if you are looking for a simple API wrapper for LLMs or a general-purpose data logging tool without specific LLM evaluation needs.

LLM application development AI agent evaluation prompt engineering retrieval-augmented generation machine learning operations

Maintenance 17 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

3,160

Forks

251

Language

Python

License

MIT

Featured in

Agent Governance in 2026: Who's Building the Guardrails? Your Agent is Hitting its Ceiling — Who's Actually Fixing It

Compare

trulens and llm-trace

Related agents

traceroot-ai/traceroot

Find the Root Cause in Your Code's Trace

future-agi/traceAI

Open Source AI Tracing Framework built on Opentelemetry for AI Applications and Frameworks

evilmartians/agent-prism

React components for visualizing traces from AI agents

VishApp/multiagent-debugger

Multi-Agent Debugger: An AI-powered debugging system using CrewAI to orchestrate specialized...

InftyAI/alphatrion

⚒️ AlphaTrion is an open-source observability platform for AI agents, including experiment...

Explore AI Agents

All categories Trending AI Agent directory Insights