jianzhnie/MultimodalTookit
Incorporate Image, Text and Tabular Data with HuggingFace Transformers
This toolkit helps you make better predictions or classifications using a mix of data types, like customer reviews (text), product details (numbers), and images. It takes these different kinds of information, processes them, and then outputs a prediction or a category, such as whether a customer will recommend a product or the likelihood of pet adoption. It's for data scientists and machine learning engineers who need to build robust models from diverse datasets.
No commits in the last 6 months.
Use this if you need to build a machine learning model that predicts an outcome or classifies data, and your input data includes a combination of text, images, and traditional numerical or categorical information.
Not ideal if your dataset only contains a single data type (e.g., only text or only tabular numbers) or if you are not working with prediction or classification tasks.
Stars
12
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 01, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/jianzhnie/MultimodalTookit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dorarad/gansformer
Generative Adversarial Transformers
j-min/VL-T5
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
rkansal47/MPGAN
The message passing GAN https://arxiv.org/abs/2106.11535 and generative adversarial particle...
Yachay-AI/byt5-geotagging
Confidence and Byt5 - based geotagging model predicting coordinates from text alone.