2

Extrapolated Urban View Synthesis Benchmark
GARF: Learning Generalizable 3D Reassembly for Real-World Fractures
Co-VisiON: Co-Visibility ReasONing on Sparse Image Sets of Indoor Scenes
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model
NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments
Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Memorize What Matters: Emergent Scene Decomposition from Multitraverse
Multiview Scene Graph
FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction