Praveengovianalytics/falcon-evaluate

Falcon Evaluate is an open-source Python library aims to revolutionise the LLM - RAG evaluation process by offering a low-code solution. Our goal is to make the evaluation process as seamless and efficient as possible, allowing you to focus on what truly matters.This library aims to provide an easy-to-use toolkit for assessing the performance, bias

/ 100

Emerging

When evaluating multiple large language models (LLMs) or retrieval-augmented generation (RAG) systems, this tool helps you compare their responses to a set of prompts and reference answers. It takes a table containing your prompts, correct answers, and each model's generated text, then outputs a detailed performance breakdown, including readability, toxicity, and similarity scores. This is ideal for AI product managers, data scientists, or researchers who need to quantify and understand the quality of their LLMs.

No commits in the last 6 months.

Use this if you need an easy way to compare the performance, bias, and general behavior of different LLMs or RAG systems using various metrics.

Not ideal if you are only evaluating a single model and don't require comparative analysis against multiple alternatives.

LLM-evaluation RAG-system-assessment AI-model-comparison natural-language-processing AI-product-management

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

eth-sri/matharena

Evaluation of LLMs on latest math competitions

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality,...

HPAI-BSC/TuRTLe

TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)

nlp-uoregon/mlmm-evaluation

Multilingual Large Language Models Evaluation Benchmark

haesleinhuepf/human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation

Explore Transformer Models

All categories Trending Transformer directory Insights