invictus717/MiCo

[ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale

/ 100

Emerging

This project offers a powerful AI model that can understand information from many different sources at once, similar to how the human brain processes sights, sounds, and language together. It takes in various types of data like images, videos, depth maps, and text, and produces a universal representation that can be used for a wide range of analytical tasks. It's designed for AI researchers and machine learning engineers working on advanced multimodal understanding.

124 stars. No commits in the last 6 months.

Use this if you are developing AI applications that need to interpret and reason across diverse data types simultaneously, such as combining visual, auditory, and textual information.

Not ideal if you are looking for a simple, off-the-shelf solution for single-modality tasks or if you lack the expertise in advanced AI model pretraining and deployment.

multimodal-AI machine-learning-research computer-vision natural-language-processing AI-model-pretraining

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

124

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

Explore ML Frameworks

All categories Trending ML Framework directory Insights