Physical Intelligence Team Unveils MEM for Robots: A Multiscale Memory System That Gives Gemma 3-4B VLAs 15-Minute Context for Complex Tasks
Current robotics policies, especially Vision-Language-Action (VLA) models, often work with a single observation or a very short history. This ‘memory deficit’ makes long-horizon tasks, such as cleaning the kitchen or following a complex recipe, impossible to compute or prone to failure. To address this, researchers from Physical Intelligence, Stanford, UC Berkeley, and MIT have presented … Read more