truera/trulens

Evaluation and Tracking for LLM Experiments and AI Agents

71
/ 100
Verified

This tool helps AI engineers and developers systematically evaluate and track their Large Language Model (LLM) application experiments. It takes your LLM application's prompts, models, retrievers, and knowledge sources as input, and provides detailed feedback and performance insights to help you identify failure modes. The output enables you to understand and improve your application's behavior and performance.

3,160 stars. Actively maintained with 9 commits in the last 30 days. Available on PyPI.

Use this if you are building or iterating on an LLM-powered application and need a structured way to test, compare, and improve different versions of your app.

Not ideal if you are looking for a simple API wrapper for LLMs or a general-purpose data logging tool without specific LLM evaluation needs.

LLM application development AI agent evaluation prompt engineering retrieval-augmented generation machine learning operations
Maintenance 17 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 19 / 25

How are scores calculated?

Stars

3,160

Forks

251

Language

Python

License

MIT

Last pushed

Mar 10, 2026

Commits (30d)

9

Dependencies

5

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/truera/trulens"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.