deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
This project offers unified models for understanding and generating multimodal content. You can input text prompts and images to get various outputs like image descriptions, answers to questions about images, or new images based on your prompts. It's designed for researchers and practitioners working with advanced AI that combines language and vision.
17,708 stars. No commits in the last 6 months.
Use this if you need a single AI model to both interpret images and text, and generate images from text instructions.
Not ideal if you only need a specialized tool for either text generation or image generation, as its strength is in combining both.
Stars
17,708
Forks
2,239
Language
Python
License
MIT
Category
Last pushed
Feb 01, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/deepseek-ai/Janus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice