PacificAI/langtest

Deliver safe & effective language models

/ 100

Established

This helps AI developers and MLOps engineers ensure their language models are fair, robust, and accurate before deployment. You feed it your language model and it outputs a comprehensive report detailing performance across various tests, along with suggestions for data augmentation. This tool is for anyone building, evaluating, or maintaining large language models or other NLP systems.

552 stars.

Use this if you need to thoroughly test language models for biases, fairness, robustness, and accuracy, especially in regulated or sensitive domains like healthcare or finance.

Not ideal if you are looking for a basic model performance metric or a tool that doesn't focus on responsible AI testing.

AI development MLOps Responsible AI Natural Language Processing Model quality assurance

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

552

Forks

Language

Python

License

Apache-2.0

Related tools

microsoft/OpenRCA

[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

Babelscape/ALERT

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language...

TrustGen/TrustEval-toolkit

[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative...

ChenWu98/agent-attack

[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents

Trust4AI/ASTRAL

Automated Safety Testing of Large Language Models

Explore LLM Tools

All categories Trending LLM Tool directory Insights