Qwen-Applications/CLIPO

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

/ 100

Emerging

This project helps large language models (LLMs) become much better at complex reasoning tasks, especially in mathematics. By comparing correct reasoning steps against incorrect ones, the system learns the underlying logic needed to solve problems robustly. The input is a base language model and datasets of problems with correct and incorrect reasoning examples, and the output is a fine-tuned model that performs significantly better on challenging reasoning benchmarks. This would be used by AI researchers or engineers who are developing or deploying advanced LLMs for tasks requiring logical thought.

Use this if you need to significantly improve a language model's ability to tackle difficult, multi-step reasoning problems, particularly in mathematical domains, and want to make it more robust to new or varied problem types.

Not ideal if your primary goal is simple text generation or tasks that do not require complex logical reasoning or problem-solving capabilities.

AI-model-training mathematical-reasoning large-language-models model-robustness AI-research

No License No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 3 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...

PRIME-RL/TTRL

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

sapientinc/HRM

Hierarchical Reasoning Model Official Release

tigerchen52/query_level_uncertainty

query-level uncertainty in LLMs

reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

Explore Transformer Models

All categories Trending Transformer directory Insights