Arize-ai/phoenix
AI Observability & Evaluation
This tool helps AI practitioners understand and improve their Large Language Model (LLM) applications. You input your LLM's interactions and performance metrics, and it provides insights into how well your models are working and where they might be going wrong. It's for anyone building, evaluating, or maintaining LLM-powered applications, such as AI product managers, machine learning engineers, and data scientists.
8,847 stars. Used by 7 other packages. Actively maintained with 271 commits in the last 30 days. Available on PyPI.
Use this if you need to track, evaluate, and troubleshoot the performance of your LLM-powered applications across different versions and prompts.
Not ideal if you are looking for a general-purpose monitoring tool for non-AI applications or traditional machine learning models.
Stars
8,847
Forks
753
Language
Jupyter Notebook
License
—
Category
Last pushed
Mar 13, 2026
Commits (30d)
271
Dependencies
46
Reverse dependents
7
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/Arize-ai/phoenix"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Community Discussion
Recent Releases
Compare
Related tools
langfuse/langfuse
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management,...
Mirascope/mirascope
The LLM Anti-Framework
Agenta-AI/agenta
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM...
Helicone/helicone
🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
algorithmicsuperintelligence/optillm
Optimizing inference proxy for LLMs