lmarena/search-arena

⚔️ [ICLR 2026] Official code of "Search Arena: Analyzing Search-Augmented LLMs".

/ 100

Emerging

This project helps AI researchers and developers understand how people interact with AI models that use search engines to answer questions. It provides a dataset of user queries and responses from various search-augmented AI systems, along with user feedback. The output includes data and analysis scripts to evaluate model performance, user preferences, and citation accuracy, enabling researchers to improve future search-augmented AI.

Use this if you are an AI researcher or developer focused on building and evaluating search-augmented large language models, and you need data and tools to analyze user interactions, preferences, and model behavior in real-world search scenarios.

Not ideal if you are an end-user looking for a pre-built search interface or an off-the-shelf AI tool for general information retrieval, as this project is for research and development into the underlying AI systems.

AI research large language models natural language processing user experience evaluation search system analysis

No License No Package No Dependents

Maintenance 10 / 25

Adoption 8 / 25

Maturity 7 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights