zli12321/qa_metrics

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.

/ 100

Emerging

This helps evaluate how well a question-answering system or large language model generates answers. You provide the questions, the correct answers, and the system's generated answers, and it outputs scores indicating the quality and accuracy of the generated responses. This is for anyone who needs to assess the performance of AI models designed to answer questions, like an AI product manager, researcher, or quality assurance specialist.

No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly and comprehensively assess the quality of answers produced by various question-answering systems, from short facts to longer explanations.

Not ideal if you are looking for a tool to generate questions or answers rather than evaluate them, or if you don't have existing correct answers to compare against.

AI model evaluation natural language processing conversational AI information retrieval text generation

Stale 6m

Maintenance 2 / 25

Adoption 8 / 25

Maturity 25 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights