Vita
We present VITA (accepted to ICLR 2026), vision-to-action flow matching policy, which evolves latent images, instead of random Gaussian, to actions. VITA is the first MLP-only policy that handles complex real-world bimanual tasks such as ALOHA.
We present VITA (accepted to ICLR 2026), vision-to-action flow matching policy, which evolves latent images, instead of random Gaussian, to actions. VITA is the first MLP-only policy that handles complex real-world bimanual tasks such as ALOHA.