Vita

We present VITA, vision-to-action flow matching policy, which evolves latent images, instead of random Gaussian, to latent actions. VITA is highly efficient because of the conditioning-free and MLP-only architecture.