HICAI-ZJU/SciKnowEval
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
This project helps evaluate how well large language models (LLMs) understand and apply scientific knowledge across various domains like Biology, Chemistry, Physics, and Materials Science. It takes an LLM's responses to scientific questions as input and provides a detailed assessment of its abilities, from recalling facts to complex reasoning and ethical discernment. Scientists, researchers, and AI developers can use this to benchmark and improve LLMs for scientific applications.
No commits in the last 6 months.
Use this if you need to thoroughly assess a large language model's capabilities in scientific contexts, especially its ability to remember, comprehend, reason, discern, and apply scientific knowledge.
Not ideal if you are looking for a general-purpose LLM evaluation that doesn't focus specifically on multi-level scientific knowledge or if your model is not designed for scientific tasks.
Stars
27
Forks
3
Language
Python
License
—
Category
Last pushed
Jul 13, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/HICAI-ZJU/SciKnowEval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
microsoft/NeMoEval
A Benchmark Tool for Natural Language-based Network Management
FudanSELab/ClassEval
Benchmark ClassEval for class-level code generation.
apartresearch/specificityplus
👩💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"
claws-lab/XLingEval
Code and Resources for the paper, "Better to Ask in English: Cross-Lingual Evaluation of Large...
nicolay-r/RuSentRel-Leaderboard
This is an official Leaderboard for the RuSentRel-1.1 dataset originally described in paper...