HICAI-ZJU/SciKnowEval

SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

/ 100

Experimental

This project helps evaluate how well large language models (LLMs) understand and apply scientific knowledge across various domains like Biology, Chemistry, Physics, and Materials Science. It takes an LLM's responses to scientific questions as input and provides a detailed assessment of its abilities, from recalling facts to complex reasoning and ethical discernment. Scientists, researchers, and AI developers can use this to benchmark and improve LLMs for scientific applications.

No commits in the last 6 months.

Use this if you need to thoroughly assess a large language model's capabilities in scientific contexts, especially its ability to remember, comprehend, reason, discern, and apply scientific knowledge.

Not ideal if you are looking for a general-purpose LLM evaluation that doesn't focus specifically on multi-level scientific knowledge or if your model is not designed for scientific tasks.

scientific-research LLM-evaluation AI-in-science knowledge-assessment scientific-AI

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

microsoft/NeMoEval

A Benchmark Tool for Natural Language-based Network Management

FudanSELab/ClassEval

Benchmark ClassEval for class-level code generation.

apartresearch/specificityplus

👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"

claws-lab/XLingEval

Code and Resources for the paper, "Better to Ask in English: Cross-Lingual Evaluation of Large...

nicolay-r/RuSentRel-Leaderboard

This is an official Leaderboard for the RuSentRel-1.1 dataset originally described in paper...

Explore NLP Tools

All categories Trending NLP directory Insights