IBM/unitxt
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking
This tool helps AI and machine learning engineers reliably measure the performance of different AI models across various tasks like text generation, image recognition, or code completion. You provide your AI model and a task, and it outputs detailed performance scores and benchmarks. It is designed for AI practitioners who need to rigorously test and compare their models before deployment.
211 stars. Used by 1 other package. Available on PyPI.
Use this if you need a standardized, comprehensive, and reproducible way to evaluate your AI models against a wide range of existing benchmarks or custom datasets.
Not ideal if you are looking for a simple, single-metric evaluation for a small, one-off model test.
Stars
211
Forks
65
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 16, 2026
Commits (30d)
0
Dependencies
4
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/IBM/unitxt"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral,...
lean-dojo/LeanDojo
Tool for data extraction and interacting with Lean programmatically.
GoodStartLabs/AI_Diplomacy
Frontier Models playing the board game Diplomacy.
google/litmus
Litmus is a comprehensive LLM testing and evaluation tool designed for GenAI Application...
salesforce/CodeT5
Home of CodeT5: Open Code LLMs for Code Understanding and Generation