invictus717/MiCo
[ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale
This project offers a powerful AI model that can understand information from many different sources at once, similar to how the human brain processes sights, sounds, and language together. It takes in various types of data like images, videos, depth maps, and text, and produces a universal representation that can be used for a wide range of analytical tasks. It's designed for AI researchers and machine learning engineers working on advanced multimodal understanding.
124 stars. No commits in the last 6 months.
Use this if you are developing AI applications that need to interpret and reason across diverse data types simultaneously, such as combining visual, auditory, and textual information.
Not ideal if you are looking for a simple, off-the-shelf solution for single-modality tasks or if you lack the expertise in advanced AI model pretraining and deployment.
Stars
124
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Sep 02, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/invictus717/MiCo"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch