RenzeLou/AAAR-1.0
The source code for running LLMs on the AAAR-1.0 benchmark.
This project helps researchers and AI practitioners evaluate how well large language models (LLMs) can assist with common research tasks. It provides a benchmark to test LLMs on inferring equations from papers, designing experiments from research proposals, identifying weaknesses in paper drafts, and critiquing peer reviews. The output includes performance metrics for various LLMs on these tasks, helping users understand their capabilities.
No commits in the last 6 months.
Use this if you are a researcher or AI practitioner who needs to rigorously assess and compare the performance of different LLMs in assisting with academic research workflows.
Not ideal if you are looking for a tool to directly apply LLMs to your research tasks without needing to evaluate their performance against a benchmark.
Stars
18
Forks
—
Language
Python
License
MIT
Category
Last pushed
Apr 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/RenzeLou/AAAR-1.0"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
monarch-initiative/ontogpt
LLM-based ontological extraction tools, including SPIRES
weAIDB/awesome-data-llm
Official Repository of "LLM Γ DATA" Survey Paper
AXYZdong/AMchat
AM (Advanced Mathematics) Chat is a large language model that integrates advanced mathematical...
skywalker023/sodaverse
π₯€π§π»βπCode and dataset for our EMNLP 2023 paper - "SODA: Million-scale Dialogue Distillation with...
Y-Research-SBU/TimeSeriesScientist
Official Repository for TimeSeriesScientist