Shanghai-Digital-Brain-Laboratory/BDM-DB1

A large-scale multi-modal pre-trained model

/ 100

Emerging

This project offers a powerful AI model that can understand and generate text, interpret images, and make decisions in complex environments. It takes in various types of information, like natural language instructions, visual data from video games, or problem definitions such as a Traveling Salesperson Problem, and outputs intelligent actions or solutions. It's designed for researchers and engineers exploring advanced AI capabilities across language, vision, and automated decision-making.

134 stars. No commits in the last 6 months.

Use this if you are an AI researcher or engineer working on multi-modal AI and want to experiment with a pre-trained model capable of generalized task performance across text, images, and simulated decision-making.

Not ideal if you are a practitioner looking for a ready-to-use, off-the-shelf solution for a specific business problem without significant AI development expertise.

artificial-intelligence multi-modal-learning reinforcement-learning natural-language-processing computer-vision

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

134

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

dorarad/gansformer

Generative Adversarial Transformers

j-min/VL-T5

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

invictus717/MetaTransformer

Meta-Transformer for Unified Multimodal Learning

rkansal47/MPGAN

The message passing GAN https://arxiv.org/abs/2106.11535 and generative adversarial particle...

Yachay-AI/byt5-geotagging

Confidence and Byt5 - based geotagging model predicting coordinates from text alone.

Explore Transformer Models

All categories Trending Transformer directory Insights