BatsResearch/planetarium
Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL
This is a tool for developers who are building or evaluating large language models (LLMs) that need to understand and generate planning problems. It takes natural language descriptions of tasks and converts them into a formal planning language called PDDL. The output is a dataset and a method to rigorously compare whether an LLM's generated PDDL correctly matches a ground truth PDDL description, without needing to run a planner. This project is for AI researchers and developers working on automated planning and LLM capabilities.
No commits in the last 6 months.
Use this if you are developing or benchmarking LLMs that translate natural language instructions into formal planning problem descriptions like PDDL.
Not ideal if you are a practitioner looking to simply generate plans for your real-world problems without developing or evaluating an LLM.
Stars
65
Forks
6
Language
Python
License
BSD-3-Clause
Category
Last pushed
Oct 16, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/BatsResearch/planetarium"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
xrsrke/toolformer
Implementation of Toolformer: Language Models Can Teach Themselves to Use Tools
MozerWang/AMPO
[ICLR 2026] Adaptive Social Learning via Mode Policy Optimization for Language Agents
real-stanford/reflect
[CoRL 2023] REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction
nsidn98/LLaMAR
Code for our paper LLaMAR: LM-based Long-Horizon Planner for Multi-Agent Robotics
WayneMao/RoboMatrix
The Official Implementation of RoboMatrix