ChenDelong1999/polite-flamingo

🦩 Official repository of paper "Visual Instruction Tuning with Polite Flamingo" (AAAI-24 Oral)

/ 100

Experimental

This project offers models that understand images and respond to questions or instructions in a polite, natural, and human-like manner. You input an image and a question or instruction, and the model provides a conversational response. This is ideal for anyone looking to integrate AI that interacts pleasantly about visual content, such as a content creator, a customer service manager, or an educator.

No commits in the last 6 months.

Use this if you need an AI that can understand images and provide polite, helpful, and multi-turn conversational responses, making interactions feel more natural.

Not ideal if your primary need is strictly factual, brief, or highly specialized image analysis where conversational tone is irrelevant or undesirable.

visual-question-answering conversational-ai image-captioning customer-engagement digital-assistant

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights