MaxLSB/mini-paligemma2

Minimalist implementation of PaliGemma 2 & PaliGemma VLM from scratch

21
/ 100
Experimental

This project provides a direct way to use Google's PaliGemma 2 and PaliGemma models for understanding images and text together. You feed it an image and a text prompt (like 'Caption' or 'Detect tiger'), and it outputs a description, an answer to a question about the image, or highlights detected objects. This tool is for researchers or practitioners who need to quickly integrate advanced multimodal AI capabilities for image analysis into their workflows.

No commits in the last 6 months.

Use this if you need to perform tasks like image captioning, visual question answering, or object detection by combining image and text inputs.

Not ideal if you need a conversational AI that remembers previous interactions or if you need to fine-tune a model without a pre-built pipeline.

image-captioning visual-question-answering object-detection multimodal-ai computer-vision
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

13

Forks

Language

Python

License

MIT

Last pushed

Feb 22, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/MaxLSB/mini-paligemma2"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.