evalscope and llm-eval

These are competitors in the LLM/RAG evaluation space, as both provide customizable evaluation frameworks with support for multiple benchmarks and RAG assessment, though evalscope offers broader model type coverage (LLM, VLM, AIGC) while llm-eval is more specialized for language models.

evalscope

Verified

llm-eval

Emerging

Maintenance 20/25

Adoption 11/25

Maturity 25/25

Community 21/25

Maintenance 2/25

Adoption 9/25

Maturity 15/25

Community 19/25

Stars: 2,501

Forks: 285

Downloads: —

Commits (30d): 34

Language: Python

License: Apache-2.0

Stars: 82

Forks: 18

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No risk flags

Stale 6m No Package No Dependents

About evalscope

modelscope/evalscope

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

This tool helps AI model developers and researchers objectively assess how well large language models (LLMs), vision-language models (VLMs), and other generative AI models perform. You provide various models and datasets, and it generates detailed comparison reports and performance metrics, including stress test results and interactive visualizations. It helps you understand a model's strengths and weaknesses across different tasks and benchmarks.

AI model benchmarking Generative AI evaluation Large model comparison AI performance testing Model quality assurance

About llm-eval

justplus/llm-eval

大语言模型评估平台，支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。

This platform helps AI product managers and researchers quickly evaluate the performance of large language models (LLMs). You can upload your own datasets (like Q&A pairs, multiple-choice questions, or RAG data) and it outputs detailed reports on model accuracy, latency, and throughput. It's designed for anyone needing to compare, test, and optimize LLMs for specific applications.

AI-evaluation LLM-benchmarking NLP-testing model-comparison RAG-assessment

Related comparisons

evalscope and ragrank evalscope and continuous-eval evalscope and llm-eval-bench evalscope and ragrank

Scores updated daily from GitHub, PyPI, and npm data. How scores work