piresramon/gpt-4-enem
Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.
This project provides a way to test how well large language models (LLMs) perform on the ENEM, Brazil's main university entrance exam. It takes in questions from past ENEM exams, including both text and images, and evaluates how accurately an LLM answers them. This is useful for researchers and educators who want to understand the capabilities and limitations of AI in high-stakes academic evaluations.
No commits in the last 6 months.
Use this if you are a researcher or academic who needs to benchmark different language models on complex, multidisciplinary university entrance exams, especially those with visual components.
Not ideal if you are a student looking for a study tool for the ENEM, as this is for evaluating AI models, not for human test preparation.
Stars
52
Forks
11
Language
Python
License
MIT
Category
Last pushed
Dec 06, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/piresramon/gpt-4-enem"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Goekdeniz-Guelmez/mlx-lm-lora
Train Large Language Models on MLX.
uber-research/PPLM
Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
VHellendoorn/Code-LMs
Guide to using pre-trained large language models of source code
ssbuild/chatglm_finetuning
chatglm 6b finetuning and alpaca finetuning
jarobyte91/pytorch_beam_search
A lightweight implementation of Beam Search for sequence models in PyTorch.