piresramon/gpt-4-enem

Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.

/ 100

Emerging

This project provides a way to test how well large language models (LLMs) perform on the ENEM, Brazil's main university entrance exam. It takes in questions from past ENEM exams, including both text and images, and evaluates how accurately an LLM answers them. This is useful for researchers and educators who want to understand the capabilities and limitations of AI in high-stakes academic evaluations.

No commits in the last 6 months.

Use this if you are a researcher or academic who needs to benchmark different language models on complex, multidisciplinary university entrance exams, especially those with visual components.

Not ideal if you are a student looking for a study tool for the ENEM, as this is for evaluating AI models, not for human test preparation.

AI-evaluation educational-assessment language-model-benchmarking multimodal-AI Brazilian-education

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

Goekdeniz-Guelmez/mlx-lm-lora

Train Large Language Models on MLX.

uber-research/PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.

VHellendoorn/Code-LMs

Guide to using pre-trained large language models of source code

ssbuild/chatglm_finetuning

chatglm 6b finetuning and alpaca finetuning

jarobyte91/pytorch_beam_search

A lightweight implementation of Beam Search for sequence models in PyTorch.

Explore Transformer Models

All categories Trending Transformer directory Insights