IAAR-Shanghai/GuessArena

[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning

/ 100

Emerging

This tool helps researchers and product managers thoroughly evaluate how well Large Language Models (LLMs) understand specific industries like finance, healthcare, or education, and how effectively they can reason within those domains. You provide unstructured documents relevant to your chosen field, and the tool outputs detailed reports on an LLM's knowledge and reasoning abilities. This is for anyone tasked with assessing or selecting LLMs for specialized business applications.

Use this if you need to precisely measure an LLM's grasp of domain-specific information and its ability to apply logic within complex, real-world industry contexts.

Not ideal if you're looking for general-purpose LLM evaluations that don't require deep dives into specialized knowledge or complex reasoning within a particular industry.

LLM evaluation domain-specific AI AI model assessment industry AI applications knowledge reasoning

No Package No Dependents

Maintenance 6 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights