declare-lab/LLM-PuzzleTest

This repository is maintained to release dataset and models for multimodal puzzle reasoning.

/ 100

Emerging

This project provides datasets and tools for evaluating how well large multimodal AI models understand and solve visual puzzles, similar to those found in IQ tests. It takes an AI model and a puzzle image as input, then measures the model's ability to identify patterns and provide correct answers. This is useful for AI researchers and developers who are building or testing advanced AI models and want to measure their abstract reasoning capabilities.

113 stars. No commits in the last 6 months.

Use this if you are a researcher or developer focused on building and evaluating the reasoning capabilities of multimodal AI models, and you need standardized benchmarks for visual abstract pattern recognition.

Not ideal if you are looking for a general-purpose AI model to solve your everyday visual tasks or a tool for human puzzle-solving.

AI-model-evaluation multimodal-AI cognitive-AI AI-benchmarking abstract-reasoning

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

113

Forks

Language

Python

License

MIT

Higher-rated alternatives

ExtensityAI/symbolicai

A neurosymbolic perspective on LLMs

TIGER-AI-Lab/MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding...

deep-symbolic-mathematics/LLM-SR

[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation...

microsoft/interwhen

A framework for verifiable reasoning with language models.

zhudotexe/fanoutqa

Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language...

Explore Transformer Models

All categories Trending Transformer directory Insights