Gen-Verse/HermesFlow

[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

/ 100

Experimental

This project helps AI researchers and developers improve the ability of multimodal large language models (MLLMs) to understand and generate content. It takes in image-caption pairs and uses an iterative self-optimization process to refine the model's performance. The outcome is a more accurate and coherent MLLM that better aligns image and text information.

No commits in the last 6 months.

Use this if you are an AI researcher or developer working on advanced MLLM architectures and want to enhance their multimodal understanding and generation capabilities.

Not ideal if you are looking for an out-of-the-box solution for end-user applications or do not have experience with model training and optimization.

AI Research Multimodal AI Large Language Models Machine Learning Engineering Generative AI

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

hao-ai-lab/FastVideo

A unified inference and post-training framework for accelerated video generation.

ModelTC/LightX2V

Light Image Video Generation Inference Framework

thu-ml/TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

PKU-YuanGroup/Helios

Helios: Real Real-Time Long Video Generation Model

PKU-YuanGroup/MagicTime

[TPAMI 2025🔥] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

Explore Diffusion Models

All categories Trending Diffusion directory Insights