OSU-NLP-Group/cobalt

Code and data for the paper "Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation"

/ 100

Experimental

Cobalt helps large language models (LLMs) generate more accurate and functional code over multiple steps. It takes existing code generation attempts and uses them to train the LLM to better complete coding tasks one step at a time. This tool is designed for AI researchers and practitioners working on improving LLMs for complex, iterative coding challenges.

Use this if you are an AI researcher or practitioner looking to enhance large language models' ability to generate correct code through multi-turn interactions, especially when balancing training cost and performance.

Not ideal if you are looking for a ready-to-use code generation application rather than a method for training and evaluating underlying LLMs.

AI research code generation reinforcement learning large language models machine learning engineering

No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 11 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

open-thought/reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Hmbown/Hegelion

Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)

LLM360/Reasoning360

A repo for open research on building large reasoning models

TsinghuaC3I/Awesome-RL-for-LRMs

A Survey of Reinforcement Learning for Large Reasoning Models

bowang-lab/BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25

Explore LLM Tools

All categories Trending LLM Tool directory Insights