flashclub/ModelJudge
这是一个基于 Next.js 构建的多语言 AI 模型评估平台,支持多模型对比和实时流式响应。A multilingual AI model evaluation platform built with Next.js, allowing users to compare responses from multiple models and receive a final judgment.
This platform helps AI developers and researchers evaluate the performance of different AI models. You input a question, select up to three models to generate answers, and a fourth model provides a rating and a final answer. The end user is anyone working with AI models who needs to compare their outputs and receive an objective judgment.
Use this if you need to quickly compare the responses of multiple AI models to a specific prompt and get a consolidated judgment.
Not ideal if you need to perform deep, statistical analysis of model performance or integrate evaluations into a larger automated pipeline.
Stars
95
Forks
5
Language
TypeScript
License
MIT
Category
Last pushed
Dec 07, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/flashclub/ModelJudge"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral,...
IBM/unitxt
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...
lean-dojo/LeanDojo
Tool for data extraction and interacting with Lean programmatically.
GoodStartLabs/AI_Diplomacy
Frontier Models playing the board game Diplomacy.
google/litmus
Litmus is a comprehensive LLM testing and evaluation tool designed for GenAI Application...