dustalov/evalica

Evalica, your favourite evaluation toolkit

/ 100

Established

Evalica is a toolkit for statisticians, researchers, and data analysts that helps quantify how well different items or ideas compare against each other, or how consistently multiple people rate things. You input structured data of comparisons (e.g., 'pizza' vs. 'burger' with a 'winner') or a matrix of ratings, and it outputs scores, ranks, or reliability metrics like Krippendorff's alpha. It's designed for anyone needing to objectively evaluate preferences or agreement from collected data.

Available on PyPI.

Use this if you need to rank items based on pairwise comparisons (like in competitive events or preference studies) or measure the agreement between multiple raters on a set of items.

Not ideal if your primary need is general-purpose statistical modeling beyond ranking, reliability, and uncertainty estimation.

statistical-analysis market-research survey-analysis quality-assurance evaluation-metrics

Maintenance 10 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Related tools

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights