Introduction
Our dataset is named NYC-Indoor-VPR. It is composed of images recorded in New York City from April 2022 to April 2023. Footage was captured using hand-held Insta360 one x2 spherical cameras, generating videos with a resolution of 1920x960. On the basis of raw images, we use MSeg, a semantic segmentation method, to replace moving objects such as people and cars with white pixels. Fig. 1 compares anonymized and raw images.
We recorded images of 13 different floors/scenes within the six buildings. We chose buildings with varied utilities and appearances: the Oculus, New York University Silver Center for Arts and Science, Elmer Holmes Bobst Library, Morton Williams Supermarket, and Metropolitan Museum of Art. These settings represent a broad range of indoor spaces, including shopping malls, teaching buildings, libraries, supermarkets, and museums. Fig. 2 shows the trajectories and example images of certain scenes.
For each building, we selected one or multiple floors as scenes. For each scene, we fixed the trajectory and captured videos along the same route at different times throughout the year. Fig. 3 shows the time distribution of visits. The videos were recorded from April to July 2022 and from March to April 2023. Therefore, it contains various changes in illumination and appearance. As shown in Fig. 4, we can see image changes at the same location over a year.