Wanderland Icon

Wanderland

Geometrically Grounded Simulation
for Open-World Embodied AI

1 New York University 2 Cornell University
⚖️: Equal contribution, random order ✉️: Corresponding author

TL;DR

Visual realism is insufficient for embodied AI. Trustworthy benchmarking demands the metric-scale geometric grounding that previous pipelines lack.





Why Existing Pipelines Fail?



  • Casual videos tend to have unidirectional capture, whereas our capture has diverse camera views.
  • Vision-only 3D reconstruction is still NOT as good as multi-sensor fusion SLAM (click tab below).
Vid2Sim Ours
GaussGym Ours

Our Framework

Overview Image
  1. Our pipeline begins with multi-sensor capture using the MetaCam device in real-world urban spaces.
  2. MetaCam Studio processes the raw data via LIV-SLAM to produce a colorized, globally consistent metric point cloud and accurate camera poses.
  3. We initialize 3D Gaussians from the metric point cloud and render per-view depth maps from this initialization.
  4. The 3DGS model is optimized with both photometric and depth losses.
  5. In parallel, we extract a reliable collision mesh from the same global point cloud.
  6. We integrate the trained 3DGS model and the collision mesh into a single USD scene.
  7. The USD scene can be directly loaded into Isaac Sim for training and evaluating navigation policies.

Data Statistics and Comparison

Dataset Statistics and Comparison

BibTeX



@article{liu2025wanderland,
title={Wanderland: Geometrically Grounded Simulationfor Open-World Embodied AI},
author={Xinhao Liu* and Jiaqi Li* and Youming Deng and Ruxin Chen and Yingjia Zhang and Yifei Ma and Li Guo and Yiming
Li and Jing Zhang and Chen Feng},
journal={arXiv preprint arXiv:2511.20620},
year={2025}
}