OpenBMB/VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

/ 100

Emerging

This project offers two distinct capabilities: VisCPM-Chat and VisCPM-Paint, both supporting Chinese and English. VisCPM-Chat allows users to upload images and engage in detailed conversations about their content, extracting descriptions and complex insights. VisCPM-Paint transforms textual descriptions into corresponding images. These tools are ideal for content creators, marketers, educators, and anyone needing to generate or understand visual content across both languages.

1,070 stars. No commits in the last 6 months.

Use this if you need to generate images from text descriptions or have detailed conversations about images in both Chinese and English.

Not ideal if your primary need is strictly text-based conversation without any visual interaction, or if you require models focused solely on a single language.

content-creation visual-analysis multilingual-communication image-generation digital-marketing

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 17 / 25

How are scores calculated?

Stars

1,070

Forks

Language

Python

License

—

Higher-rated alternatives

TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

zjunlp/EasyInstruct

[ACL 2024] An Easy-to-use Instruction Processing Framework for LLMs.

rese1f/MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

NVlabs/Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Explore Transformer Models

All categories Trending Transformer directory Insights