SapienzaNLP/ita-bench
A collection of Italian benchmarks for LLM evaluation
ITA-Bench helps researchers and developers evaluate how well Large Language Models (LLMs) understand and generate Italian. You input an LLM and it produces performance scores across various tasks like question answering, commonsense reasoning, and named entity recognition in Italian. This tool is for AI researchers, natural language processing engineers, and data scientists working with Italian language models.
Use this if you need a standardized way to measure the capabilities of Italian LLMs across diverse linguistic tasks and benchmarks.
Not ideal if your primary focus is on English LLM evaluation or if you need to create entirely new Italian evaluation datasets from scratch.
Stars
37
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/SapienzaNLP/ita-bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stanfordnlp/axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
aidatatools/ollama-benchmark
LLM Benchmark for Throughput via Ollama (Local LLMs)
LarHope/ollama-benchmark
Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.
qcri/LLMeBench
Benchmarking Large Language Models
THUDM/LongBench
LongBench v2 and LongBench (ACL 25'&24')