1.1 Object-Centric Representation:
(1) Task Roles: EgoPush partitions scene objects into three task roles: active object (currently pushed), anchor object (defines the target relation), and obstacles.
(2) Role-wise Encoding: A shared-weight estimator encodes each role into a latent embedding, and these embeddings are concatenated as an object-centric latent state.
1.2 Relative Spatial Reasoning:
(3) Relation-First Representation: Weight sharing places all roles in a common feature space, enabling the policy to reason over relative spatial relations instead of isolated object states.
@article{An2026EgoPush,
title = {EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots},
author = {An, Boyuan and Wang, Zhexiong and Wang, Yipeng and Li, Jiaqi and Li, Sihang and Zhang, Jing and Feng, Chen},
journal={arXiv preprint arXiv:2602.18071},
year = {2026}
}