RobinDong/tiny_multimodal
Tiny and simple implementation of multimodal models
This project helps machine learning engineers and researchers quickly experiment with foundational multimodal models. It takes image-text datasets as input and allows you to train, fine-tune, or deploy compact versions of models that can understand both images and text. This is ideal for individuals working on developing and optimizing AI models who need efficient experimentation.
No commits in the last 6 months.
Use this if you are an AI/ML developer or researcher looking to explore multimodal models on standard consumer-grade GPUs without needing massive computational resources.
Not ideal if you are an end-user needing an out-of-the-box solution for image analysis or text generation, or if you require full-scale, production-ready large multimodal models.
Stars
8
Forks
—
Language
Python
License
MIT
Category
Last pushed
Aug 20, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/RobinDong/tiny_multimodal"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)