InternScience/ResearchClawBench

ResearchClawBench: Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery

/ 100

Experimental

This tool helps researchers, scientists, and analysts evaluate how well AI agents can conduct scientific research independently. You provide an AI agent with raw scientific data and research goals, and the tool produces a publication-quality report, along with a rigorous evaluation against human-authored scientific papers. It's designed for anyone working with AI agents who needs to verify their ability to perform complex scientific tasks, from data analysis to report generation.

Use this if you need to objectively measure an AI agent's capability to perform real-world scientific research, generate reports, and compare its findings against established scientific literature.

Not ideal if you are looking for a tool to simply test basic coding abilities or factual recall, rather than comprehensive, autonomous scientific inquiry.

scientific-research AI-evaluation data-analysis academic-publishing experimental-science

No Package No Dependents

Maintenance 13 / 25

Adoption 6 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

alvinunreal/awesome-autoresearch

A curated list of autonomous improvement loops, research agents, and autoresearch-style systems...

WecoAI/awesome-autoresearch

Curated list of AutoResearch use cases with optimization traces and open source implementations

krzysztofdudek/ResearcherSkill

One file. Your AI agent becomes a scientist. 30+ experiments while you sleep.

mitdbg/Carnot

Optimized System for Deep Research

OpenRaiser/NanoResearch

🦞+🔬: NanoResearch: The Autonomous AI Research Assistant

Explore AI Agents

All categories Trending AI Agent directory Insights