dustalov/evalica
Evalica, your favourite evaluation toolkit
Evalica is a toolkit for statisticians, researchers, and data analysts that helps quantify how well different items or ideas compare against each other, or how consistently multiple people rate things. You input structured data of comparisons (e.g., 'pizza' vs. 'burger' with a 'winner') or a matrix of ratings, and it outputs scores, ranks, or reliability metrics like Krippendorff's alpha. It's designed for anyone needing to objectively evaluate preferences or agreement from collected data.
Available on PyPI.
Use this if you need to rank items based on pairwise comparisons (like in competitive events or preference studies) or measure the agreement between multiple raters on a set of items.
Not ideal if your primary need is general-purpose statistical modeling beyond ranking, reliability, and uncertainty estimation.
Stars
62
Forks
5
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 10, 2026
Monthly downloads
19
Commits (30d)
0
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/dustalov/evalica"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents