allenai/x-lxmert

PyTorch code for EMNLP 2020 paper "X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers"

/ 100

Emerging

This project helps AI researchers and developers build systems that can understand and generate content across both images and text. You can input text descriptions or questions and receive generated images, or provide images to get captions and answers to visual questions. It's designed for those working on advanced AI applications in computer vision and natural language processing.

No commits in the last 6 months.

Use this if you are developing AI models that need to both interpret visual information (like understanding scenes or answering questions about images) and generate images from text descriptions, or if you need to create descriptive captions for images.

Not ideal if you are a non-technical user looking for an out-of-the-box application for image generation or visual question answering, as this project requires significant technical setup and coding expertise.

AI Research Computer Vision Natural Language Processing Image Generation Visual Question Answering

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

filipstrand/mflux

MLX native implementations of state-of-the-art generative image models

potamides/DeTikZify

Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ.

FoundationVision/Infinity

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

zai-org/CogView

Text-to-Image generation. The repo for NeurIPS 2021 paper "CogView: Mastering Text-to-Image...

EleutherAI/DALLE-mtf

Open-AI's DALL-E for large scale training in mesh-tensorflow.

Explore Transformer Models

All categories Trending Transformer directory Insights