InternLM/Visual-ERM
Official Implementation of "Visual-ERM: Reward Modeling for Visual Equivalence"
This tool helps developers and AI model trainers improve how well their AI models reproduce structured visual content like charts, tables, or SVG graphics. It takes an original image and an image rendered from the model's generated code, then precisely identifies and describes visual differences. AI model trainers and developers who work on vision-to-code generative models would use this to refine their models.
Use this if you need fine-grained, interpretable feedback on visual discrepancies when an AI model converts images into structured code, especially where visual layout, alignment, and style are critical.
Not ideal if your primary concern is text-based similarity of code or if you don't need detailed visual error analysis for model refinement.
Stars
25
Forks
—
Language
Python
License
—
Category
Last pushed
Mar 16, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/InternLM/Visual-ERM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice