Qwen-Applications/CLIPO
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR
This project helps large language models (LLMs) become much better at complex reasoning tasks, especially in mathematics. By comparing correct reasoning steps against incorrect ones, the system learns the underlying logic needed to solve problems robustly. The input is a base language model and datasets of problems with correct and incorrect reasoning examples, and the output is a fine-tuned model that performs significantly better on challenging reasoning benchmarks. This would be used by AI researchers or engineers who are developing or deploying advanced LLMs for tasks requiring logical thought.
Use this if you need to significantly improve a language model's ability to tackle difficult, multi-step reasoning problems, particularly in mathematical domains, and want to make it more robust to new or varied problem types.
Not ideal if your primary goal is simple text generation or tasks that do not require complex logical reasoning or problem-solving capabilities.
Stars
10
Forks
2
Language
Python
License
—
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Qwen-Applications/CLIPO"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvs-health/uqlm
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...
PRIME-RL/TTRL
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
sapientinc/HRM
Hierarchical Reasoning Model Official Release
tigerchen52/query_level_uncertainty
query-level uncertainty in LLMs
reasoning-survey/Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models