kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment: Robotic Learning Datasets and RT-X Models"
This project offers tools to control robots using a combination of visual and text instructions. You provide the system with video or image feeds of a robot's environment, along with text commands describing the task. The output guides robot actions, enabling it to perform real-world tasks. This is ideal for robotics researchers and developers working on autonomous agents.
237 stars.
Use this if you are developing robotic systems that need to understand and execute complex commands based on both visual perception and natural language instructions.
Not ideal if you need a pre-packaged, ready-to-deploy solution for a specific robot or a simple, direct control interface without multimodal AI capabilities.
Stars
237
Forks
24
Language
Python
License
MIT
Category
Last pushed
Mar 06, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/kyegomez/RT-X"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
chuanyangjin/MMToM-QA
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Muennighoff/vilio
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle
kyegomez/PALM-E
Implementation of "PaLM-E: An Embodied Multimodal Language Model"