zchuz/TimeBench

The repository for ACL 2024 paper "TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models"

29
/ 100
Experimental

TimeBench helps researchers and practitioners evaluate how well large language models (LLMs) understand and reason about time. You provide a set of LLMs and get back a detailed performance analysis across various temporal reasoning tasks, revealing their strengths and weaknesses in handling dates, sequences, and event durations. This is for anyone researching, developing, or deploying LLMs who needs to understand their temporal intelligence.

No commits in the last 6 months.

Use this if you need to rigorously test and compare different large language models' abilities to process and understand temporal information, from simple date arithmetic to complex event sequencing.

Not ideal if you are looking for a tool to train LLMs or apply them directly to a specific business problem, rather than evaluate their fundamental temporal reasoning capabilities.

AI-research LLM-evaluation natural-language-understanding model-benchmarking temporal-reasoning
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 6 / 25

How are scores calculated?

Stars

34

Forks

2

Language

Python

License

MIT

Last pushed

Jun 29, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zchuz/TimeBench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.