THUDM/AlignBench

大模型多维度中文对齐评测基准 (ACL 2024)

37
/ 100
Emerging

AlignBench helps you thoroughly evaluate how well large Chinese language models align with human instructions. You input a Chinese language model's responses to a standardized set of user questions, and it outputs a detailed, multi-dimensional score and analysis of its performance. This is for researchers, developers, or product managers who need to assess and compare the alignment quality of different Chinese large language models for real-world applications.

421 stars.

Use this if you need a comprehensive and reliable way to benchmark the 'human-likeness' and instruction-following ability of Chinese large language models.

Not ideal if you are looking to evaluate non-Chinese language models or are interested in metrics other than human alignment and instruction following.

large-language-models chinese-nlp model-evaluation ai-alignment instruction-following
No License No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

421

Forks

29

Language

Python

License

Last pushed

Oct 25, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/THUDM/AlignBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.