kyegomez/RT-X

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"

/ 100

Established

This project offers tools to control robots using a combination of visual and text instructions. You provide the system with video or image feeds of a robot's environment, along with text commands describing the task. The output guides robot actions, enabling it to perform real-world tasks. This is ideal for robotics researchers and developers working on autonomous agents.

237 stars.

Use this if you are developing robotic systems that need to understand and execute complex commands based on both visual perception and natural language instructions.

Not ideal if you need a pre-packaged, ready-to-deploy solution for a specific robot or a simple, direct control interface without multimodal AI capabilities.

robotics autonomous-systems human-robot-interaction robotic-process-automation machine-perception

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

237

Forks

Language

Python

License

MIT

Related models

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

chuanyangjin/MMToM-QA

[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Muennighoff/vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

kyegomez/PALM-E

Implementation of "PaLM-E: An Embodied Multimodal Language Model"

Explore Transformer Models

All categories Trending Transformer directory Insights