SO-NeRF

Active View Planning for NeRF using Surrogate Objectives

New York University

*Denotes Equal Contribution

overview

Overview of SO-NeRF against a baseline, ActiveNeRF. (a) Given any training mesh, we compute the individual surrogate objective scores C, Q, D and T, (b) train a model, SOARNet (c) for trajectory planning, without the need of prior visit of the candidate poses (unlike ActiveNeRF). (d) We can achieve comparable NeRF quality, but with a significant speed-up compared to the baseline.


Abstract

Despite the great success of Neural Radiance Fields (NeRF), its data-gathering process remains vague with only a general rule of thumb of ``sampling as densely as possible''. The lack of understanding of what actually constitutes good views for NeRF makes it difficult to actively plan a sequence of views that yield the maximal reconstruction quality. We propose Surrogate Objectives for Active Radiance Fields (SOAR), which is a set of interpretable functions that evaluates the goodness of views using geometric and photometric visual cues - surface coverage, geometric complexity, textural complexity, and ray diversity. Moreover, by learning to infer the SOAR scores from a deep network, SOARNet, we are able to effectively select views in mere seconds instead of hours, without the need for prior visits to all the candidate views or training any radiance field during such planning. Our experiments show SOARNet outperforms the baselines with \( \sim \)80x speed-up while achieving better or comparable reconstruction qualities. We finally show that SOAR is model-agnostic, thus it generalizes across fully neural-implicit to fully explicit approaches.


SOAR Score Generation Pipeline

contraction

The overall pipeline of our proposed approach. (a) Given an object, (b) we initialize via pseudo-coverage initialization to obtain a set of starting poses \( \mathcal{\hat{M}} \) and corresponding camera-poses \( \mathcal{\hat{C}} \). (c) We then generate a trajectory for sampling in a greedy manner by computing the objective scores of all unseen camera-poses, \( c_{n+1} \) where \( \mathcal{C}^- \) is the set of all unseen camera-poses. (e) To compute said scores, we utilize our novel model, SOARNet for speedy inference, and we then simply pick the candidate with the highest score, \( c_z \) as our next pose, \( \hat{c}_{n+1} \) until we reach our budget, \( \mathcal{B} \) for (d) radiance fields training.

SO-NeRF vs ActiveNeRF

Training Budget, \( \mathcal{B} \) = 30 images for 300 epochs at ~10 minutes train time


  • SO-NeRF (Ours) ActiveNeRF
  • SO-NeRF (Ours) ActiveNeRF
  • SO-NeRF (Ours) ActiveNeRF
  • SO-NeRF (Ours) ActiveNeRF
  • SO-NeRF (Ours) ActiveNeRF
  • SO-NeRF (Ours) ActiveNeRF
  • SO-NeRF (Ours) ActiveNeRF

Citation

If you use this work or find it helpful, please consider citing: (bibtex)

@article{lee2023so,
title={SO-NeRF: Active View Planning for NeRF using Surrogate Objectives},
author={Lee, Keifer and Gupta, Shubham and Kim, Sunglyoung and Makwana, Bhargav and Chen, Chao and Feng, Chen},
journal={arXiv preprint arXiv:2312.03266},
year={2023}
} 

Acknowledgements

The work is supported by NSF Awards 2238968 and 2322242. The website template is inspired by LERF and MERF. Image sliders are based on dics. The carousel is designed using Splide.