ChenDelong1999/polite-flamingo
🦩 Official repository of paper "Visual Instruction Tuning with Polite Flamingo" (AAAI-24 Oral)
This project offers models that understand images and respond to questions or instructions in a polite, natural, and human-like manner. You input an image and a question or instruction, and the model provides a conversational response. This is ideal for anyone looking to integrate AI that interacts pleasantly about visual content, such as a content creator, a customer service manager, or an educator.
No commits in the last 6 months.
Use this if you need an AI that can understand images and provide polite, helpful, and multi-turn conversational responses, making interactions feel more natural.
Not ideal if your primary need is strictly factual, brief, or highly specialized image analysis where conversational tone is irrelevant or undesirable.
Stars
65
Forks
3
Language
Python
License
—
Category
Last pushed
Dec 09, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ChenDelong1999/polite-flamingo"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice