IAAR-Shanghai/NewsBench
[ACL 2024 Main] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
This project helps Chinese journalism professionals evaluate how well large language models (LLMs) perform editorial tasks like summarization or headline generation, and if they adhere to safety guidelines. It takes a Chinese news article or topic as input and provides an assessment of the LLM's journalistic writing proficiency and safety adherence. News editors, content strategists, and researchers in Chinese media would use this to ensure AI-generated content meets industry standards.
No commits in the last 6 months.
Use this if you need to systematically test and compare the editorial capabilities and safety of different large language models for Chinese news content creation.
Not ideal if your focus is on evaluating LLMs for languages other than Chinese or for tasks outside of journalistic editorial workflows.
Stars
34
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Jun 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/IAAR-Shanghai/NewsBench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
sierra-research/tau2-bench
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
bigcode-project/bigcodebench
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems