YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
This resource curates a list of academic papers focusing on how Large Language Models (LLMs) are used to create or modify various types of media like images, videos, 3D models, and audio. It allows researchers, academics, or AI practitioners to quickly find relevant studies by filtering through different modalities and generation techniques. The collection primarily provides academic papers and their associated code or project pages for those exploring the latest advancements in multimodal AI generation.
540 stars. No commits in the last 6 months.
Use this if you are a researcher or practitioner in AI looking for a structured overview and direct access to recent academic work on LLM-powered image, video, 3D, or audio generation and editing.
Not ideal if you are looking for ready-to-use software, practical tutorials, or development tools for building multimodal applications.
Stars
540
Forks
30
Language
HTML
License
—
Category
Last pushed
Apr 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chrisliu298/awesome-llm-unlearning
A resource repository for machine unlearning in large language models
worldbench/awesome-vla-for-ad
🌐 Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
worldbench/awesome-spatial-intelligence
🌐 Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems