LaVi-Lab/CLEVA

[EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"

/ 100

Emerging

This platform helps you accurately assess the performance of Chinese Large Language Models (LLMs). You input a Chinese LLM you want to test, and it outputs detailed evaluation results across 31 tasks like summarization, translation, and fact-checking, along with a trustworthy leaderboard. This is ideal for researchers, developers, or businesses working with Chinese natural language processing who need to benchmark and compare different models.

No commits in the last 6 months.

Use this if you need a comprehensive and standardized way to evaluate Chinese LLMs, minimizing issues like data contamination.

Not ideal if you are looking to evaluate non-Chinese language models or if you need a solution without using the HELM framework for local evaluations.

Chinese-NLP LLM-evaluation model-benchmarking natural-language-processing AI-research

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Shell

License

—

Higher-rated alternatives

luheng/deep_srl

Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next

sileod/tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning

loomchild/maligna

Bilingual sengence aligner

CK-Explorer/DuoSubs

Semantic subtitle aligner and merger for bilingual subtitle syncing.

coastalcph/lex-glue

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Explore NLP Tools

All categories Trending NLP directory Insights