ntkhoa95/multimodal-for-vision

Vision Framework: A modular multi-agent system for computer vision tasks, featuring natural language queries, intelligent task routing, and specialized agents for classification, detection, and more. Built with PyTorch and modern deep learning models.

/ 100

Experimental

This framework helps you automatically analyze images and videos by simply asking questions in natural language. You can input an image or video and ask "What's in this image?" or "Detect objects in this scene" to get detailed classifications, identified objects with bounding boxes, or descriptive captions. It's designed for anyone needing quick visual insights without manual tagging, such as content moderators, quality control inspectors, or security analysts.

No commits in the last 6 months.

Use this if you need to rapidly classify, detect objects in, or generate descriptions for large collections of images or video footage using plain English prompts.

Not ideal if you require highly specialized vision tasks beyond classification, detection, or captioning, or if you need to train custom models from scratch for unique visual data.

visual-content-analysis image-moderation security-monitoring data-labeling video-analytics

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

Explore ML Frameworks

All categories Trending ML Framework directory Insights