sdpkjc/SATQuest
🏞 A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
This project helps evaluate and improve the logical reasoning abilities of large language models (LLMs). It takes problem definitions, such as Boolean satisfiability (SAT) formulas, and generates various question formats. It then verifies the LLM's answers, providing a score and diagnostics to help developers understand and fine-tune their models for better logical performance. This is for AI researchers and developers working on building and refining LLMs that need to excel at complex logical tasks.
No commits in the last 6 months. Available on PyPI.
Use this if you are developing or evaluating LLMs and need a robust framework to test and improve their ability to solve logical reasoning problems, particularly those based on Conjunctive Normal Form (CNF).
Not ideal if you are not working with LLMs or their logical reasoning capabilities, or if your primary need is for a general-purpose SAT solver rather than an LLM evaluation tool.
Stars
5
Forks
—
Language
Python
License
MIT
Category
Last pushed
Sep 26, 2025
Commits (30d)
0
Dependencies
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sdpkjc/SATQuest"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvs-health/uqlm
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...
open-thought/reasoning-gym
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
PRIME-RL/TTRL
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
LLM360/Reasoning360
A repo for open research on building large reasoning models
bowang-lab/BioReason
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25