SapienzaNLP/ita-bench

A collection of Italian benchmarks for LLM evaluation

/ 100

Emerging

ITA-Bench helps researchers and developers evaluate how well Large Language Models (LLMs) understand and generate Italian. You input an LLM and it produces performance scores across various tasks like question answering, commonsense reasoning, and named entity recognition in Italian. This tool is for AI researchers, natural language processing engineers, and data scientists working with Italian language models.

Use this if you need a standardized way to measure the capabilities of Italian LLMs across diverse linguistic tasks and benchmarks.

Not ideal if your primary focus is on English LLM evaluation or if you need to create entirely new Italian evaluation datasets from scratch.

Italian NLP LLM evaluation natural language processing AI model benchmarking computational linguistics

No Package No Dependents

Maintenance 6 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

stanfordnlp/axbench

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

aidatatools/ollama-benchmark

LLM Benchmark for Throughput via Ollama (Local LLMs)

LarHope/ollama-benchmark

Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.

qcri/LLMeBench

Benchmarking Large Language Models

THUDM/LongBench

LongBench v2 and LongBench (ACL 25'&24')

Explore Transformer Models

All categories Trending Transformer directory Insights