IBM/unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking

69
/ 100
Established

This tool helps AI and machine learning engineers reliably measure the performance of different AI models across various tasks like text generation, image recognition, or code completion. You provide your AI model and a task, and it outputs detailed performance scores and benchmarks. It is designed for AI practitioners who need to rigorously test and compare their models before deployment.

211 stars. Used by 1 other package. Available on PyPI.

Use this if you need a standardized, comprehensive, and reproducible way to evaluate your AI models against a wide range of existing benchmarks or custom datasets.

Not ideal if you are looking for a simple, single-metric evaluation for a small, one-off model test.

AI-model-evaluation machine-learning-benchmarking natural-language-processing-evaluation computer-vision-evaluation code-generation-evaluation
Maintenance 10 / 25
Adoption 11 / 25
Maturity 25 / 25
Community 23 / 25

How are scores calculated?

Stars

211

Forks

65

Language

Python

License

Apache-2.0

Last pushed

Feb 16, 2026

Commits (30d)

0

Dependencies

4

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/IBM/unitxt"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.