sdpkjc/SATQuest

🏞 A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

/ 100

Emerging

This project helps evaluate and improve the logical reasoning abilities of large language models (LLMs). It takes problem definitions, such as Boolean satisfiability (SAT) formulas, and generates various question formats. It then verifies the LLM's answers, providing a score and diagnostics to help developers understand and fine-tune their models for better logical performance. This is for AI researchers and developers working on building and refining LLMs that need to excel at complex logical tasks.

No commits in the last 6 months. Available on PyPI.

Use this if you are developing or evaluating LLMs and need a robust framework to test and improve their ability to solve logical reasoning problems, particularly those based on Conjunctive Normal Form (CNF).

Not ideal if you are not working with LLMs or their logical reasoning capabilities, or if your primary need is for a general-purpose SAT solver rather than an LLM evaluation tool.

LLM evaluation AI research logical reasoning model fine-tuning natural language processing

Stale 6m

Maintenance 2 / 25

Adoption 4 / 25

Maturity 24 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...

open-thought/reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

PRIME-RL/TTRL

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

LLM360/Reasoning360

A repo for open research on building large reasoning models

bowang-lab/BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25

Explore Transformer Models

All categories Trending Transformer directory Insights