Arize-ai/phoenix

AI Observability & Evaluation

81
/ 100
Verified

This tool helps AI practitioners understand and improve their Large Language Model (LLM) applications. You input your LLM's interactions and performance metrics, and it provides insights into how well your models are working and where they might be going wrong. It's for anyone building, evaluating, or maintaining LLM-powered applications, such as AI product managers, machine learning engineers, and data scientists.

8,847 stars. Used by 7 other packages. Actively maintained with 271 commits in the last 30 days. Available on PyPI.

Use this if you need to track, evaluate, and troubleshoot the performance of your LLM-powered applications across different versions and prompts.

Not ideal if you are looking for a general-purpose monitoring tool for non-AI applications or traditional machine learning models.

LLM development AI evaluation Prompt engineering Model troubleshooting Experiment tracking
Maintenance 22 / 25
Adoption 15 / 25
Maturity 25 / 25
Community 19 / 25

How are scores calculated?

Stars

8,847

Forks

753

Language

Jupyter Notebook

License

Last pushed

Mar 13, 2026

Commits (30d)

271

Dependencies

46

Reverse dependents

7

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/Arize-ai/phoenix"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.