athina-ai/athina-sdk

LLM Testing SDK that helps you write and run tests to monitor your LLM app in production

/ 100

Experimental

This tool helps you ensure the reliability and quality of outputs from your AI applications. It takes your LLM application's outputs and a set of predefined tests, then evaluates how well the outputs meet specific criteria. This is for anyone building or managing an application powered by Large Language Models (LLMs) who needs to verify the consistency and accuracy of the AI's responses.

132 stars. No commits in the last 6 months.

Use this if you need to systematically test, monitor, and improve the performance of your LLM-powered application, both during development and in live production.

Not ideal if you are not working with Large Language Models or if your primary concern is traditional software unit testing.

AI application development LLM quality assurance AI model monitoring prompt engineering AI ethics and safety

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 2 / 25

How are scores calculated?

Stars

132

Forks

Language

Python

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral,...

IBM/unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...

lean-dojo/LeanDojo

Tool for data extraction and interacting with Lean programmatically.

GoodStartLabs/AI_Diplomacy

Frontier Models playing the board game Diplomacy.

google/litmus

Litmus is a comprehensive LLM testing and evaluation tool designed for GenAI Application...

Explore LLM Tools

All categories Trending LLM Tool directory Insights