EternityYW/TRAM-Benchmark

TRAM: Benchmarking Temporal Reasoning for Large Language Models (Findings of ACL 2024)

30
/ 100
Emerging

This project provides a comprehensive benchmark for evaluating how well large language models (LLMs) understand and reason about time in natural language. It offers a collection of over half a million multiple-choice questions across ten diverse temporal tasks. Researchers and developers working on LLMs can use this to assess and compare the temporal reasoning capabilities of different models.

No commits in the last 6 months.

Use this if you are developing or fine-tuning large language models and need to rigorously test their ability to handle time-related information and questions.

Not ideal if you are looking for a general-purpose natural language processing tool for non-temporal tasks or for direct integration into an application.

large-language-models natural-language-understanding temporal-reasoning llm-evaluation nlp-benchmarking
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 7 / 25

How are scores calculated?

Stars

26

Forks

2

Language

Jupyter Notebook

License

MIT

Last pushed

Jun 21, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/EternityYW/TRAM-Benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.