betagouv/ComparIA

Open source LLM arena created by the French Government

/ 100

Established

This project helps organizations evaluate how well different AI models respond to specific questions, especially in languages other than English or for specialized topics like healthcare or law. You provide questions or prompts, compare various AI-generated answers, and vote for the best ones. The outcome is a unique dataset of human preferences that can be used to improve AI models or raise awareness about their capabilities and limitations. Governments, universities, hospitals, or any organization needing to assess AI performance in non-standard scenarios would use this.

Use this if you need to gather specific human feedback on AI model responses for less-resourced languages or niche industry sectors, or if you want to educate people about AI model diversity and biases.

Not ideal if you are looking for a pre-built, off-the-shelf AI model or a general-purpose AI chat interface for everyday use.

AI evaluation language localization public education data collection sector-specific AI

No Package No Dependents

Maintenance 10 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

Apache-2.0

Related tools

Skytliang/Multi-Agents-Debate

MAD: The first work to explore Multi-Agent Debate with Large Language Models :D

liuxiaotong/ai-dataset-radar

Multi-source async competitive intelligence engine for AI training data ecosystems with...

Arnoldlarry15/ARES-Dashboard

AI Red Team Operations Console

llm-ring/lmring

Open-source, self-hostable LLM arena with model compare, voting, and leaderboards

YerbaPage/SWE-Debate

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Explore LLM Tools

All categories Trending LLM Tool directory Insights