OSU-NLP-Group/cobalt
Code and data for the paper "Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation"
Cobalt helps large language models (LLMs) generate more accurate and functional code over multiple steps. It takes existing code generation attempts and uses them to train the LLM to better complete coding tasks one step at a time. This tool is designed for AI researchers and practitioners working on improving LLMs for complex, iterative coding challenges.
Use this if you are an AI researcher or practitioner looking to enhance large language models' ability to generate correct code through multi-turn interactions, especially when balancing training cost and performance.
Not ideal if you are looking for a ready-to-use code generation application rather than a method for training and evaluating underlying LLMs.
Stars
9
Forks
—
Language
Python
License
MIT
Category
Last pushed
Feb 04, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/OSU-NLP-Group/cobalt"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-thought/reasoning-gym
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Hmbown/Hegelion
Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)
LLM360/Reasoning360
A repo for open research on building large reasoning models
TsinghuaC3I/Awesome-RL-for-LRMs
A Survey of Reinforcement Learning for Large Reasoning Models
bowang-lab/BioReason
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25