FoundationVision/Liquid
(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators
Liquid is a tool for creative professionals and researchers that takes text descriptions or existing images and generates new, high-quality images. It can also understand and describe visual content. This is designed for anyone needing to generate diverse visual content or perform deep image analysis without relying on separate tools for different modalities.
640 stars.
Use this if you need a single, powerful system to both generate images from text and analyze existing images to extract information or descriptions.
Not ideal if you primarily need simple image editing or extremely fast, lightweight image generation without complex understanding capabilities.
Stars
640
Forks
35
Language
Python
License
MIT
Category
Last pushed
Nov 10, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/FoundationVision/Liquid"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification),...
Yangyi-Chen/Multimodal-AND-Large-Language-Models
Paper list about multimodal and large language models, only used to record papers I read in the...
thuml/AutoTimes
Official implementation for "AutoTimes: Autoregressive Time Series Forecasters via Large Language Models"
flixpar/med-ts-llm
MedTsLLM: Leveraging LLMs for Multimodal Medical Time Series Analysis