deepseek-ai/Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

/ 100

Emerging

This project offers unified models for understanding and generating multimodal content. You can input text prompts and images to get various outputs like image descriptions, answers to questions about images, or new images based on your prompts. It's designed for researchers and practitioners working with advanced AI that combines language and vision.

17,708 stars. No commits in the last 6 months.

Use this if you need a single AI model to both interpret images and text, and generate images from text instructions.

Not ideal if you only need a specialized tool for either text generation or image generation, as its strength is in combining both.

multimodal-AI-research image-captioning text-to-image-generation visual-question-answering AI-model-development

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

17,708

Forks

2,239

Language

Python

License

MIT

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights