I am an Assistant Professor at MIT EECS, where I am leading the Scene Representation Group. Previously, I did my Ph.D. at Stanford University as well as a Postdoc at MIT CSAIL. My research interest lies in building AI that perceives and models the world the way that humans do. Specifically, I work towards models that can learn to reconstruct a rich state description of their environment, such as reconstructing its 3D structure, materials, semantics, etc. from vision. These models should also be able to model the impact of their own actions on that environment, i.e., learn a "mental simulator" or "world model". I am particularly interested in models that can learn these skills fully self-supervised only from video and by self-directed interaction with the world.