kyegomez/Fuyu
Implementation of Adepts Fuyu all-new Multi-Modality model in pytorch
This project offers a foundational building block for AI developers creating multi-modal applications. It takes raw image data and text sequences, processes them together, and produces an integrated output that can be used for various AI tasks. AI developers working on systems that need to understand both images and text will find this useful.
No commits in the last 6 months. Available on PyPI.
Use this if you are an AI developer looking to integrate a multi-modal model that processes images and text using a transformer decoder architecture.
Not ideal if you are an end-user without programming experience or looking for a ready-to-use application, as this is a developer library.
Stars
24
Forks
3
Language
Python
License
MIT
Category
Last pushed
Nov 11, 2024
Commits (30d)
0
Dependencies
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/kyegomez/Fuyu"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
KaiyangZhou/pytorch-vsumm-reinforce
Unsupervised video summarization with deep reinforcement learning (AAAI'18)
adambielski/siamese-triplet
Siamese and triplet networks with online pair/triplet mining in PyTorch