Evaluation RAG Tools

There are 27 evaluation tools tracked. The highest-rated is FastBuilderAI/memory at 40/100 with 20 stars.

Get all 27 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	FastBuilderAI/memory FastMemory is a topological representation of text data using concepts as...	40	Emerging	20	HTML
2	syncreus/syncreus-eval Evaluate your LLM apps with one function call. Hallucination detection, RAG...	38	Emerging	2	Python
3	bevinkatti/rag-harness ⚡ CLI to Evaluate and Compare RAG systems with RAGAS-style scoring	38	Emerging	1	Python
4	verifywise-ai/verifywise-eval-action GitHub Action & Python SDK to evaluate LLMs in CI/CD — gate PRs on...	24	Experimental	2	Python
5	masaakisakamoto/memory-os Deterministic continuity for AI systems. Detect and repair inconsistencies...	23	Experimental	1	TypeScript
6	CjTruHeart/abundance-codex Evidence-anchored narrative dataset that shifts AI reasoning from...	23	Experimental	1	Python
7	priyanshus/evaliphy E2E RAG Testing Tool	22	Experimental	—	TypeScript
8	VectoringAI/ai-engineering Practical tutorials to build AI Engineering skills	22	Experimental	—	Jupyter Notebook
9	yosuancrespo/specforge-ai AI-augmented QA platform for spec-driven development and testing,...	22	Experimental	—	Python
10	moshe19909090/llm-evaluation-pipeline End-to-end LLM evaluation pipeline with human and automated judging for...	22	Experimental	—	Jupyter Notebook
11	dahlinomine/local-llm-rag-bench Python tool for benchmarking local LLM performance on specific RAG datasets.	22	Experimental	—	—
12	kiyeonjeon21/graphrag-lab Benchmark 9 GraphRAG frameworks (Microsoft, LightRAG, nano, fast, Neo4j,...	22	Experimental	—	Python
13	hereandnowai/evaluation-of-opensource-llms-between-rag-and-finetuning-entreprise-grade Enterprise-grade evaluation comparing RAG and Fine-Tuning for local...	21	Experimental	—	Python
14	TJ-Neary/AI_Eval Comprehensive LLM evaluation framework comparing local and cloud models with...	21	Experimental	—	Python
15	thecoderr13/Corrective-RAG CRAG -A pipeline that uses tunable thresholds to validate document...	21	Experimental	—	Python
16	xiaohanzhang2005/Minor-Detection Self-evolving minor-user identification agent for anthropomorphic AI...	19	Experimental	13	Python
17	ShabnamAtf/ScenarioBench Trace-grounded compliance benchmark for Text-to-SQL and RAG	17	Experimental	—	Python
18	dipakkr/ai-engineering-guide A practical guide to AI engineering — LLMs, RAG, agents, evals, and...	17	Experimental	4	Python
19	farithadnan/KB-AnswerScorer A tool for evaluating LLM responses against a knowledge base of expert solutions.	15	Experimental	—	Python
20	DennisMRitchie/go-llm-evaluator LLM-as-a-Judge evaluation framework in Go	14	Experimental	—	Go
21	Martonidaz/multi-agent-rag-builder Desenvolvimento de um sistema multiagentes para auxiliar profissionais fora...	14	Experimental	—	Jupyter Notebook
22	emmeongoingammuaroi/reviewform AI-Powered Code Review Agent built with LangGraph, FastAPI, MCP, and RAG...	14	Experimental	—	Python
23	tovrr/Apex_LLM Private AI workspace platform: FastAPI LLM API, streaming, evals, usage...	14	Experimental	—	Python
24	songsunny00/ragas-dify-eval 使用Ragas 快速测评 Dify 应用（适合测评RAG应用）	14	Experimental	1	Python
25	SathvikNayak123/LexPilot Multi-agent RAG system for Indian Supreme Court judgments with citation...	14	Experimental	—	Python
26	makarand-thorat/finsight-ai Production RAG evaluation platform for financial documents — RAGAS...	14	Experimental	—	Python
27	rickytang666/epa-consultant 🤖 RAG-powered regulatory intelligence for EPA pesticide compliance.	13	Experimental	—	Python