LaVi-Lab/CLEVA
[EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"
This platform helps you accurately assess the performance of Chinese Large Language Models (LLMs). You input a Chinese LLM you want to test, and it outputs detailed evaluation results across 31 tasks like summarization, translation, and fact-checking, along with a trustworthy leaderboard. This is ideal for researchers, developers, or businesses working with Chinese natural language processing who need to benchmark and compare different models.
No commits in the last 6 months.
Use this if you need a comprehensive and standardized way to evaluate Chinese LLMs, minimizing issues like data contamination.
Not ideal if you are looking to evaluate non-Chinese language models or if you need a solution without using the HELM framework for local evaluations.
Stars
64
Forks
3
Language
Shell
License
—
Category
Last pushed
May 16, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/LaVi-Lab/CLEVA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
luheng/deep_srl
Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next
sileod/tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
loomchild/maligna
Bilingual sengence aligner
CK-Explorer/DuoSubs
Semantic subtitle aligner and merger for bilingual subtitle syncing.
coastalcph/lex-glue
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English