InternLM/Visual-ERM

Official Implementation of "Visual-ERM: Reward Modeling for Visual Equivalence"

/ 100

Experimental

This tool helps developers and AI model trainers improve how well their AI models reproduce structured visual content like charts, tables, or SVG graphics. It takes an original image and an image rendered from the model's generated code, then precisely identifies and describes visual differences. AI model trainers and developers who work on vision-to-code generative models would use this to refine their models.

Use this if you need fine-grained, interpretable feedback on visual discrepancies when an AI model converts images into structured code, especially where visual layout, alignment, and style are critical.

Not ideal if your primary concern is text-based similarity of code or if you don't need detailed visual error analysis for model refinement.

AI model training generative AI vision-to-code visual data reconstruction model evaluation

No License No Package No Dependents

Maintenance 13 / 25

Adoption 7 / 25

Maturity 3 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights