jeinlee1991/chinese-llm-benchmark

ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.5、ernie4.5、MiniMax-M2.5、deepseek-v3.2、Qwen3.5、llama4、智谱GLM-5、GLM-4.7、LongCat、gemma3、mistral等开源大模型。不仅提供排行榜,也提供规模超200万的大模型缺陷库!方便广大社区研究分析、改进大模型。

52
/ 100
Established

This project provides comprehensive evaluations of Chinese large language models (LLMs), covering both commercial and open-source options across diverse fields like education, finance, healthcare, and law. You input a list of Chinese LLMs you are considering, and it outputs detailed performance rankings and a database of known defects for each model. This is for anyone who needs to select or improve Chinese LLMs for specific applications, such as product managers, researchers, or business strategists.

5,675 stars. Actively maintained with 9 commits in the last 30 days.

Use this if you need to understand the strengths and weaknesses of various Chinese large language models across multiple application domains.

Not ideal if you are looking for evaluations of non-Chinese LLMs or if your primary need is for a general-purpose LLM development and observability platform.

AI model selection Chinese language processing model benchmarking AI research enterprise AI strategy
No License No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 17 / 25

How are scores calculated?

Stars

5,675

Forks

229

Language

License

Last pushed

Mar 07, 2026

Commits (30d)

9

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jeinlee1991/chinese-llm-benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.