HorizonWind2004/reconstruction-alignment
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
This project helps AI model developers enhance the performance of their unified multimodal models (those that handle both text and images) by applying a technique called Reconstruction Alignment (RecA). By integrating RecA during the self-supervised training phase, developers can improve their models' zero-shot capabilities across various tasks like image generation and editing. The input is an existing multimodal model, and the output is a significantly improved version of that model, ready for deployment.
378 stars.
Use this if you are developing or deploying unified multimodal AI models and want to boost their performance and capabilities without extensive new training data or complex reinforcement learning.
Not ideal if you are a general user looking for an off-the-shelf application, or if you are not working with multimodal AI model development.
Stars
378
Forks
15
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/HorizonWind2004/reconstruction-alignment"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
UCSC-VLAA/story-iter
[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization
PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks,...
keivalya/mini-vla
a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to...
adobe-research/custom-diffusion
Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)
byliutao/1Prompt1Story
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...