OpenBMB/VisCPM
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
This project offers two distinct capabilities: VisCPM-Chat and VisCPM-Paint, both supporting Chinese and English. VisCPM-Chat allows users to upload images and engage in detailed conversations about their content, extracting descriptions and complex insights. VisCPM-Paint transforms textual descriptions into corresponding images. These tools are ideal for content creators, marketers, educators, and anyone needing to generate or understand visual content across both languages.
1,070 stars. No commits in the last 6 months.
Use this if you need to generate images from text descriptions or have detailed conversations about images in both Chinese and English.
Not ideal if your primary need is strictly text-based conversation without any visual interaction, or if you require models focused solely on a single language.
Stars
1,070
Forks
88
Language
Python
License
—
Category
Last pushed
Jun 13, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OpenBMB/VisCPM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
zjunlp/EasyInstruct
[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
NVlabs/Eagle
Eagle: Frontier Vision-Language Models with Data-Centric Strategies