TonicAI/tvallogging

A tool for evaluating and tracking your RAG experiments. This repo contains the Python SDK for logging to Tonic Validate.

/ 100

Emerging

This tool helps AI engineers and developers building Retrieval Augmented Generation (RAG) applications to track and improve their model performance. You feed in your RAG application's responses to a benchmark dataset, along with the context it retrieved. The tool then scores these outputs using RAG metrics and visualizes the results, making it easy to compare different versions of your application.

No commits in the last 6 months.

Use this if you are developing RAG applications and need a systematic way to evaluate, track, and compare the performance of your models over time, ensuring they deliver accurate and relevant information.

Not ideal if you are looking for a general-purpose machine learning experiment tracker not specifically focused on RAG, or if you prefer to calculate RAG metrics entirely offline without a dedicated platform.

AI development RAG engineering LLM evaluation Application testing Model performance

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

vectara/open-rag-eval

RAG evaluation without the need for "golden answers"

DocAILab/XRAG

XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced...

HZYAI/RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or...

AIAnytime/rag-evaluator

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

microsoft/benchmark-qed

Automated benchmarking of Retrieval-Augmented Generation (RAG) systems

Explore RAG Tools

All categories Trending RAG directory Insights