Learn-to-Score: Efficient 3D Scene Exploration by Predicting View Utility

  • Benjamin HeppEmail author
  • Debadeepta Dey
  • Sudipta N. Sinha
  • Ashish Kapoor
  • Neel Joshi
  • Otmar Hilliges
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11219)


Camera equipped drones are nowadays being used to explore large scenes and reconstruct detailed 3D maps. When free space in the scene is approximately known, an offline planner can generate optimal plans to efficiently explore the scene. However, for exploring unknown scenes, the planner must predict and maximize usefulness of where to go on the fly. Traditionally, this has been achieved using handcrafted utility functions. We propose to learn a better utility function that predicts the usefulness of future viewpoints. Our learned utility function is based on a 3D convolutional neural network. This network takes as input a novel volumetric scene representation that implicitly captures previously visited viewpoints and generalizes to new scenes. We evaluate our method on several large 3D models of urban scenes using simulated depth cameras. We show that our method outperforms existing utility measures in terms of reconstruction performance and is robust to sensor noise.


3D reconstruction Exploration Active vision 3D CNN 

Supplementary material

474204_1_En_27_MOESM1_ESM.pdf (1.3 mb)
Supplementary material 1 (pdf 1349 KB)

Supplementary material 2 (mp4 41982 KB)


  1. 1.
    Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d–3d-semantic data for indoor scene understanding, Preprint arXiv:1702.01105 (2017)
  2. 2.
    Bircher, A., Kamel, M., Alexis, K., Oleynikova, H., Siegwart, R.: Receding horizon“next-best-view" planner for 3d exploration. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1462–1468. IEEE (2016)Google Scholar
  3. 3.
    Chen, S., Li, Y., Kwok, N.M.: Active vision in robotic systems: a survey of recent developments. Int. J. Robot. Res. 30(11), 1343–1377 (2011)CrossRefGoogle Scholar
  4. 4.
    Choudhury, S., Kapoor, A., Ranade, G., Scherer, S., Dey, D.: Adaptive information gathering via imitation learning. Robotics Science and Systems (2017)Google Scholar
  5. 5.
    Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3d object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). Scholar
  6. 6.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3d reconstructions of indoor scenes. (2017)
  7. 7.
    Dai, A., Qi, C.R., Nießner, M.: Shape completion using 3d-encoder-predictor cnns and shape synthesis. (2016)
  8. 8.
    Delmerico, J., Isler, S., Sabzevari, R., Scaramuzza, D.: A comparison of volumetric information gain metrics for active 3d object reconstruction. Autonomous Robots pp. 1–12 (2017)Google Scholar
  9. 9.
    Devrim Kaba, M., Gokhan Uzunbas, M., Nam Lim, S.: A reinforcement learning approach to the view planning problem. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6933–6941 (2017)Google Scholar
  10. 10.
    Dunn, E., Frahm, J.M.: Next best view planning for active model improvement. In: BMVC, pp. 1–11 (2009)Google Scholar
  11. 11.
    Feige, U.: A threshold of ln n for approximating set cover. JACM (1998)Google Scholar
  12. 12.
    Forster, C., Pizzoli, M., Scaramuzza, D.: Appearance-based active, monocular, dense reconstruction for micro aerial vehicles. In: Robotics: Science and Systems (RSS) (2014)Google Scholar
  13. 13.
    Fraundorfer, F., Heng, L., Honegger, D., Lee, G.H., Meier, L., Tanskanen, P., Pollefeys, M.: Vision-based autonomous mapping and exploration using a quadrotor mav. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4557–4564. IEEE (2012)Google Scholar
  14. 14.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1991–2000 (2017)Google Scholar
  15. 15.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  16. 16.
    Golovin, D., Krause, A.: Adaptive submodularity: Theory and applications in active learning and stochastic optimization. JAIR (2011).
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)Google Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). Scholar
  19. 19.
    Heng, L., Gotovos, A., Krause, A., Pollefeys, M.: Efficient visual exploration and coverage with a micro aerial vehicle in unknown environments. In: ICRA (2015).
  20. 20.
    Hepp, B., Nießner, M., Hilliges, O.: Plan3d: Viewpoint and trajectory optimization for aerial multi-view stereo reconstruction, Preprint arXiv:1705.09314 (2017)
  21. 21.
    Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 328–341 (2008)CrossRefGoogle Scholar
  22. 22.
    Hollinger, G.A., Englot, B., Hover, F.S., Mitra, U., Sukhatme, G.S.: Active planning for underwater inspection and the benefit of adaptivity. IJRR (2012). Scholar
  23. 23.
    Hornung, A., Wurm, K.M., Bennewitz, M., Stachniss, C., Burgard, W.: OctoMap: an efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots (2013). 10.1007/s10514-012-9321-0, software available at
  24. 24.
    Isler, S., Sabzevari, R., Delmerico, J., Scaramuzza, D.: An Information Gain Formulation for Active Volumetric 3D Reconstruction. In: ICRA (2016).
  25. 25.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization, Preprint arXiv:1412.6980 (2014)
  26. 26.
    Krause, A., Golovin, D.: Submodular function maximization. In: Tractability: Practical Approaches to Hard Problems (2012).
  27. 27.
    Kriegel, S., Rink, C., Bodenmüller, T., Suppa, M.: Efficient next-best-scan planning for autonomous 3d surface reconstruction of unknown objects. J. Real-Time Image Proces. 10(4), 611–631 (2015)CrossRefGoogle Scholar
  28. 28.
    Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162–5170 (2015)Google Scholar
  29. 29.
    Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functionsi. Math. Program. 14(1), 265–294 (1978)CrossRefGoogle Scholar
  30. 30.
    Riegler, G., Ulusoy, A.O., Geiger, A.: Octnet: Learning deep 3d representations at high resolutions. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  31. 31.
    Roberts, M., Dey, D., Truong, A., Sinha, S., Shah, S., Kapoor, A., Hanrahan, P., Joshi, N.: Submodular trajectory optimization for aerial 3d scanning. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  32. 32.
    Shen, S., Michael, N., Kumar, V.: Autonomous multi-floor indoor navigation with a computationally constrained mav. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 20–25. IEEE (2011)Google Scholar
  33. 33.
    Song, S., Xiao, J.: Deep sliding shapes for amodal 3d object detection in rgb-d images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 808–816 (2016)Google Scholar
  34. 34.
    Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)zbMATHGoogle Scholar
  35. 35.
    Vasquez-Gomez, J.I., Sucar, L.E., Murrieta-Cid, R., Lopez-Damian, E.: Volumetric next-best-view planning for 3d object reconstruction with positioning error. Int. J. Adv. Robot. Syst. 11(10), 159 (2014)CrossRefGoogle Scholar
  36. 36.
    Wenhardt, S., Deutsch, B., Angelopoulou, E., Niemann, H.: Active visual object reconstruction using d-, e-, and t-optimal next best views. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7. IEEE (2007)Google Scholar
  37. 37.
    Xu, K., Zheng, L., Yan, Z., Yan, G., Zhang, E., Nießner, M., Deussen, O., Cohen-Or, D., Huang, H.: Autonomous reconstruction of unknown indoor scenes guided by time-varying tensor fields. ACM Trans. Gr. (TOG) 36, 202 (2017)Google Scholar
  38. 38.
    Yamauchi, B.: A frontier-based approach for autonomous exploration. In: 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation, CIRA’97, pp. 146–151. IEEE (1997)Google Scholar
  39. 39.
    Zamir, A.R., Wekel, T., Agrawal, P., Wei, C., Malik, Jitendra, Savarese, Silvio: Generic 3D representation via pose estimation and matching. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 535–553. Springer, Cham (2016). Scholar
  40. 40.
    Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In: CVPR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Benjamin Hepp
    • 1
    • 2
    Email author
  • Debadeepta Dey
    • 2
  • Sudipta N. Sinha
    • 2
  • Ashish Kapoor
    • 2
  • Neel Joshi
    • 2
  • Otmar Hilliges
    • 1
  1. 1.ETH ZurichZurichSwitzerland
  2. 2.Microsoft ResearchRedmondUSA

Personalised recommendations