logic-OT/BobVLM

BobVLM – A 1.5B multimodal model built from scratch and pre-trained on a single P100 GPU capable of image descriptions and moderate question answering. 🤗🎉

35
/ 100
Emerging

BobVLM helps you understand what's in an image and answer questions about it. You provide an image (from a file, URL, or program) and a question or request, and it outputs a detailed text description or an answer. This tool is for developers who need to integrate image understanding capabilities into their applications.

No commits in the last 6 months.

Use this if you are a developer looking for an open-source, resource-efficient vision-language model to add image description and basic question-answering features to your applications.

Not ideal if you need highly detailed answers to complex questions or reliable analysis of close-up images, animations, or images outside of general scene descriptions.

image-analysis computer-vision multimodal-ai natural-language-processing software-development
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

11

Forks

3

Language

Python

License

MIT

Last pushed

Feb 17, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/logic-OT/BobVLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.