hggzjx/RewardAuditor
Official Repo for Paper: "Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios"
This tool helps researchers and engineers evaluate how reliable their AI reward models are when faced with unexpected or 'real-world' changes in data. It takes your reward model and various perturbed data scenarios as input and produces a statistical assessment of its robustness and trustworthiness. This is for anyone building or deploying large language models who needs to ensure their alignment systems are verifiable and dependable.
Use this if you need to rigorously test the stability and trustworthiness of your AI's reward system under diverse, challenging conditions.
Not ideal if you are only interested in basic accuracy metrics for your reward model, as this focuses on deeper statistical suitability.
Stars
31
Forks
1
Language
Python
License
—
Category
Last pushed
Jan 24, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/hggzjx/RewardAuditor"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hud-evals/hud-python
OSS RL environment + evals toolkit
hiyouga/EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
OpenRL-Lab/openrl
Unified Reinforcement Learning Framework
sail-sg/oat
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning,...
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)