IAAR-Shanghai/NewsBench

[ACL 2024 Main] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism

/ 100

Experimental

This project helps Chinese journalism professionals evaluate how well large language models (LLMs) perform editorial tasks like summarization or headline generation, and if they adhere to safety guidelines. It takes a Chinese news article or topic as input and provides an assessment of the LLM's journalistic writing proficiency and safety adherence. News editors, content strategists, and researchers in Chinese media would use this to ensure AI-generated content meets industry standards.

No commits in the last 6 months.

Use this if you need to systematically test and compare the editorial capabilities and safety of different large language models for Chinese news content creation.

Not ideal if your focus is on evaluating LLMs for languages other than Chinese or for tasks outside of journalistic editorial workflows.

Chinese-journalism news-editing AI-content-evaluation media-ethics large-language-models

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

sierra-research/tau2-bench

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

xlang-ai/OSWorld

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

bigcode-project/bigcodebench

[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

scicode-bench/SciCode

A benchmark that challenges language models to code solutions for scientific problems

Explore LLM Tools

All categories Trending LLM Tool directory Insights