Qwen-Applications/CLIPO

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

31
/ 100
Emerging

This project helps large language models (LLMs) become much better at complex reasoning tasks, especially in mathematics. By comparing correct reasoning steps against incorrect ones, the system learns the underlying logic needed to solve problems robustly. The input is a base language model and datasets of problems with correct and incorrect reasoning examples, and the output is a fine-tuned model that performs significantly better on challenging reasoning benchmarks. This would be used by AI researchers or engineers who are developing or deploying advanced LLMs for tasks requiring logical thought.

Use this if you need to significantly improve a language model's ability to tackle difficult, multi-step reasoning problems, particularly in mathematical domains, and want to make it more robust to new or varied problem types.

Not ideal if your primary goal is simple text generation or tasks that do not require complex logical reasoning or problem-solving capabilities.

AI-model-training mathematical-reasoning large-language-models model-robustness AI-research
No License No Package No Dependents
Maintenance 10 / 25
Adoption 5 / 25
Maturity 3 / 25
Community 13 / 25

How are scores calculated?

Stars

10

Forks

2

Language

Python

License

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Qwen-Applications/CLIPO"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.