CoVPR  

Collaborative Visual Place Recognition

Submitted to IROS 2024

1New York University, Brooklyn, NY 11201, USA
2Columbia University, New York, NY 10027, USA
3QUT Centre for Robotics, Queensland University of Technology, Brisbane, QLD 4000, Australia

Video explanation




Illustration of collaborative VPR.

Abstract

Visual place recognition (VPR) capabilities enable autonomous robots to navigate complex environments by discovering the environment's topology based on visual input. Most research efforts focus on enhancing the accuracy and robustness of single-robot VPR but often encounter issues such as occlusion due to individual viewpoints. Despite a number of research on multi-robot metric-based localization, there is a notable gap in research concerning more robust and efficient place-based localization with a multi-robot system. This work proposes collaborative VPR, where multiple robots share abstracted visual features to enhance place recognition capabilities. We also introduce a novel collaborative VPR framework based on similarity-regularized information fusion, reducing irrelevant noise while harnessing valuable data from collaborators. This framework seamlessly integrates with well-established single-robot VPR techniques and supports end-to-end training with a weakly-supervised contrastive loss. We conduct experiments in urban, rural, and indoor scenes, achieving a notable improvement over single-agent VPR in urban environments (~12%), along with consistent enhancements in rural (~3%) and indoor (~1%) scenarios. Our work presents a promising solution to the pressing challenges of VPR, representing a substantial step towards safe and robust autonomous systems.

Method



  • We formulate CoVPR, a meaningful and challenging task for the vision and robotics community.
  • We develop a novel and effective CoVPR method based on similarity-regularized fusion which is widely applicable to learning-based single-robot VPR methods.
  • We provide a comprehensive CoVPR benchmark that includes publicly available datasets and a dataset we collected ourselves, encompassing a wide range of scenarios, from outdoor to indoor environments.

  • Dataset

     

         

                  (a) Ford multi-AV Data                                  (b) Gibson multi-robot Data                                    (c) NYC MV Data

    We evaluate our method using NetVLAD on three datasets: the Ford multi-AV Dataset, the NYC MV dataset, and the Gibson multi-robot dataset.

    Ford multi-AV dataset: This dataset comprises outdoor scenes in rural Michigan. Evaluation includes both front and right-side views. We use 480 queries from residential scenes for training and 430 queries from university and vegetation scenes for testing. The maximum distance between the ego agent and collaborators is 5 meters.

    NYC MV dataset: Captured in New York City, this dataset consists of images from the dense urban environment. The train-test split is based on GPS information to ensure disjoint locations. We use 207 queries for training and 140 for testing. The maximum distance between the ego agent and collaborators is 10 meters.

    Gibson dataset: The Gibson multi-robot dataset contains indoor scenes simulated with Habitat-sim. We selected 350 queries for training and 279 queries for testing with front views. The training set and the test set do not share the same room, and the maximum distance between the ego agent and collaborators is 1 meter.

     

     

    VPR Result

    Video visualization of NYC MV data.

     

    Video visualization of Ford multi-AV data.

     

    Video visualization of Gibson multi-robot data.

    Qualitative Result


    Qualitative visual place recognition results on outdoor scenes. Correct retrievals are in green, and incorrect ones are in red. The first two rows are from Ford multi-AV data and the last two rows are from NYC MV data.


    Qualitative visual place recognition results on the indoor scene (Gibson multi-robot data). Correct retrievals are in green, and incorrect ones are in red.


    Fail cases in CoVPR. CoVPR can fail due to rotations and translations between ego agent and collaborators, especially in large rotations.

    BibTeX

    @article{li2023collaborative,
      title={Collaborative Visual Place Recognition},
      author={Li, Yiming and Lyu, Zonglin and Lu, Mingxuan and Chen, Chao and Milford, Michael and Feng, Chen},
      journal={arXiv preprint arXiv:2310.05541},
      year={2023}
    }

    Acknowledgements

    Chen Feng is the corresponding author.