RobinDong/tiny_multimodal

Tiny and simple implementation of multimodal models

/ 100

Experimental

This project helps machine learning engineers and researchers quickly experiment with foundational multimodal models. It takes image-text datasets as input and allows you to train, fine-tune, or deploy compact versions of models that can understand both images and text. This is ideal for individuals working on developing and optimizing AI models who need efficient experimentation.

No commits in the last 6 months.

Use this if you are an AI/ML developer or researcher looking to explore multimodal models on standard consumer-grade GPUs without needing massive computational resources.

Not ideal if you are an end-user needing an out-of-the-box solution for image analysis or text generation, or if you require full-scale, production-ready large multimodal models.

multimodal-ai model-training deep-learning ai-prototyping computer-vision

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

Explore ML Frameworks

All categories Trending ML Framework directory Insights