keivalya/mini-vla

a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to generate actions

/ 100

Established

This project helps robotics students and researchers understand how to build models that enable robots to follow instructions. It takes robot sensor data (images, internal state) and text commands, then outputs continuous actions for the robot to perform. This is for anyone learning or prototyping robot control policies, particularly those interested in vision-language-action (VLA) models.

204 stars.

Use this if you are a student or researcher looking for a clear, minimalist example to learn or prototype vision-language-action models for robotics.

Not ideal if you need a production-ready, state-of-the-art robot control system or a robust solution for real-world industrial applications.

robotics education robot control systems robot learning robot policy prototyping automation research

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 13 / 25

Community 21 / 25

How are scores calculated?

Stars

204

Forks

Language

Python

License

MIT

Related models

UCSC-VLAA/story-iter

[ICLR 2026] A Training-free Iterative Framework for Long Story Visualization

PaddlePaddle/PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks,...

adobe-research/custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)

byliutao/1Prompt1Story

🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation...

zai-org/ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

Explore Diffusion Models

All categories Trending Diffusion directory Insights