FareedKhan-dev/ai-agents-eval-techniques
Implementation of 12 AI agents evaluation techniques
This project offers practical methods for assessing the performance of AI-powered conversational agents and information retrieval systems (RAGs). It takes an AI system's responses or behaviors and applies various techniques to measure correctness, helpfulness, and efficiency, providing clear scores and insights into how well the AI is performing. AI developers, machine learning engineers, and product managers building or integrating AI agents would use this to ensure their systems meet quality standards.
No commits in the last 6 months.
Use this if you are developing or deploying AI agents or RAG systems and need robust, practical ways to evaluate their output quality and decision-making processes.
Not ideal if you are looking for a simple pass/fail test for traditional software, as this focuses on the nuanced and often subjective evaluation of AI's generative or multi-step reasoning capabilities.
Stars
37
Forks
8
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jul 31, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/FareedKhan-dev/ai-agents-eval-techniques"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
patterns-ai-core/langchainrb
Build LLM-powered applications in Ruby
3uyuan1ee/Fix_agent
基于 LangChain1.0和DeepAgents的代码优化Agent
FareedKhan-dev/Multi-Agent-AI-System
Building a Multi-Agent AI System with LangGraph and LangSmith
tadata-org/langchain-runner
Zero-configuration way to expose LangChain/LangGraph agents as autonomous services with...
skygazer42/GustoBot
五星大厨:全面Multi-Agent 的客服机器人,基于langraph实现,txt2sql ,txt2cypher, lightrag, 多模态 等