ntkhoa95/multimodal-for-vision

Vision Framework: A modular multi-agent system for computer vision tasks, featuring natural language queries, intelligent task routing, and specialized agents for classification, detection, and more. Built with PyTorch and modern deep learning models.

29
/ 100
Experimental

This framework helps you automatically analyze images and videos by simply asking questions in natural language. You can input an image or video and ask "What's in this image?" or "Detect objects in this scene" to get detailed classifications, identified objects with bounding boxes, or descriptive captions. It's designed for anyone needing quick visual insights without manual tagging, such as content moderators, quality control inspectors, or security analysts.

No commits in the last 6 months.

Use this if you need to rapidly classify, detect objects in, or generate descriptions for large collections of images or video footage using plain English prompts.

Not ideal if you require highly specialized vision tasks beyond classification, detection, or captioning, or if you need to train custom models from scratch for unique visual data.

visual-content-analysis image-moderation security-monitoring data-labeling video-analytics
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 4 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

7

Forks

1

Language

Python

License

MIT

Last pushed

Nov 07, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/ntkhoa95/multimodal-for-vision"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.