allenai/x-lxmert
PyTorch code for EMNLP 2020 paper "X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers"
This project helps AI researchers and developers build systems that can understand and generate content across both images and text. You can input text descriptions or questions and receive generated images, or provide images to get captions and answers to visual questions. It's designed for those working on advanced AI applications in computer vision and natural language processing.
No commits in the last 6 months.
Use this if you are developing AI models that need to both interpret visual information (like understanding scenes or answering questions about images) and generate images from text descriptions, or if you need to create descriptive captions for images.
Not ideal if you are a non-technical user looking for an out-of-the-box application for image generation or visual question answering, as this project requires significant technical setup and coding expertise.
Stars
50
Forks
10
Language
Python
License
—
Category
Last pushed
Aug 27, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/allenai/x-lxmert"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
filipstrand/mflux
MLX native implementations of state-of-the-art generative image models
potamides/DeTikZify
Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ.
FoundationVision/Infinity
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
zai-org/CogView
Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image...
EleutherAI/DALLE-mtf
Open-AI's DALL-E for large scale training in mesh-tensorflow.