5

Adversarial Exploitation of Data Diversity Improves Visual Localization
Extrapolated Urban View Synthesis Benchmark
GARF: Learning Generalizable 3D Reassembly for Real-World Fractures
Co-VisiON: Co-Visibility ReasONing on Sparse Image Sets of Indoor Scenes
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Multiview Scene Graph
EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction
Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
ActFormer: Scalable Collaborative Perception via Active Queries