piresramon/gpt-4-enem

Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.

41
/ 100
Emerging

This project provides a way to test how well large language models (LLMs) perform on the ENEM, Brazil's main university entrance exam. It takes in questions from past ENEM exams, including both text and images, and evaluates how accurately an LLM answers them. This is useful for researchers and educators who want to understand the capabilities and limitations of AI in high-stakes academic evaluations.

No commits in the last 6 months.

Use this if you are a researcher or academic who needs to benchmark different language models on complex, multidisciplinary university entrance exams, especially those with visual components.

Not ideal if you are a student looking for a study tool for the ENEM, as this is for evaluating AI models, not for human test preparation.

AI-evaluation educational-assessment language-model-benchmarking multimodal-AI Brazilian-education
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

52

Forks

11

Language

Python

License

MIT

Last pushed

Dec 06, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/piresramon/gpt-4-enem"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.