Evaluation RAG Tools

There are 27 evaluation tools tracked. The highest-rated is FastBuilderAI/memory at 40/100 with 20 stars.

Get all 27 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 FastBuilderAI/memory

FastMemory is a topological representation of text data using concepts as...

40
Emerging
2 syncreus/syncreus-eval

Evaluate your LLM apps with one function call. Hallucination detection, RAG...

38
Emerging
3 bevinkatti/rag-harness

⚡ CLI to Evaluate and Compare RAG systems with RAGAS-style scoring

38
Emerging
4 verifywise-ai/verifywise-eval-action

GitHub Action & Python SDK to evaluate LLMs in CI/CD — gate PRs on...

24
Experimental
5 masaakisakamoto/memory-os

Deterministic continuity for AI systems. Detect and repair inconsistencies...

23
Experimental
6 CjTruHeart/abundance-codex

Evidence-anchored narrative dataset that shifts AI reasoning from...

23
Experimental
7 priyanshus/evaliphy

E2E RAG Testing Tool

22
Experimental
8 VectoringAI/ai-engineering

Practical tutorials to build AI Engineering skills

22
Experimental
9 yosuancrespo/specforge-ai

AI-augmented QA platform for spec-driven development and testing,...

22
Experimental
10 moshe19909090/llm-evaluation-pipeline

End-to-end LLM evaluation pipeline with human and automated judging for...

22
Experimental
11 dahlinomine/local-llm-rag-bench

Python tool for benchmarking local LLM performance on specific RAG datasets.

22
Experimental
12 kiyeonjeon21/graphrag-lab

Benchmark 9 GraphRAG frameworks (Microsoft, LightRAG, nano, fast, Neo4j,...

22
Experimental
13 hereandnowai/evaluation-of-opensource-llms-between-rag-and-finetuning-entreprise-grade

Enterprise-grade evaluation comparing RAG and Fine-Tuning for local...

21
Experimental
14 TJ-Neary/AI_Eval

Comprehensive LLM evaluation framework comparing local and cloud models with...

21
Experimental
15 thecoderr13/Corrective-RAG

CRAG -A pipeline that uses tunable thresholds to validate document...

21
Experimental
16 xiaohanzhang2005/Minor-Detection

Self-evolving minor-user identification agent for anthropomorphic AI...

19
Experimental
17 ShabnamAtf/ScenarioBench

Trace-grounded compliance benchmark for Text-to-SQL and RAG

17
Experimental
18 dipakkr/ai-engineering-guide

A practical guide to AI engineering — LLMs, RAG, agents, evals, and...

17
Experimental
19 farithadnan/KB-AnswerScorer

A tool for evaluating LLM responses against a knowledge base of expert solutions.

15
Experimental
20 DennisMRitchie/go-llm-evaluator

LLM-as-a-Judge evaluation framework in Go

14
Experimental
21 Martonidaz/multi-agent-rag-builder

Desenvolvimento de um sistema multiagentes para auxiliar profissionais fora...

14
Experimental
22 emmeongoingammuaroi/reviewform

AI-Powered Code Review Agent built with LangGraph, FastAPI, MCP, and RAG...

14
Experimental
23 tovrr/Apex_LLM

Private AI workspace platform: FastAPI LLM API, streaming, evals, usage...

14
Experimental
24 songsunny00/ragas-dify-eval

使用Ragas 快速测评 Dify 应用(适合测评RAG应用)

14
Experimental
25 SathvikNayak123/LexPilot

Multi-agent RAG system for Indian Supreme Court judgments with citation...

14
Experimental
26 makarand-thorat/finsight-ai

Production RAG evaluation platform for financial documents — RAGAS...

14
Experimental
27 rickytang666/epa-consultant

🤖 RAG-powered regulatory intelligence for EPA pesticide compliance.

13
Experimental