Unsupervised Multi-view CNN for Salient View Selection of 3D Objects and Scenes

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12364)


We present an unsupervised 3D deep learning framework based on a ubiquitously true proposition named view-object consistency as it states that a 3D object and its projected 2D views always belong to the same object class. To validate its effectiveness, we design a multi-view CNN for the salient view selection of 3D objects, which quintessentially cannot be handled by supervised learning due to the difficulty of data collection. Our unsupervised multi-view CNN branches off two channels which encode the knowledge within each 2D view and the 3D object respectively and also exploits both intra-view and inter-view knowledge of the object. It ends with a new loss layer which formulates the view-object consistency by impelling the two channels to generate consistent classification outcomes. We experimentally demonstrate the superiority of our method over state-of-the-art methods and showcase that it can be used to select salient views of 3D scenes containing multiple objects.


Unsupervised 3D deep learning Multi-view CNN View-object consistency View selection 



We acknowledge the support of the Young Taishan Scholars Program of Shandong Province tsqn20190929 and the Qilu Young Scholars Program of Shandong University 31400082063101, the National Natural Science Foundation of China under Grant 61991411 and U1913204, the National Key Research and Development Plan of China under Grant 2017YFB1300205, and the Shandong Major Scientific and Technological Innovation Project 2018CXGC1503.

Supplementary material

504475_1_En_27_MOESM1_ESM.pdf (39.6 mb)
Supplementary material 1 (pdf 40581 KB)


  1. 1.
    Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)CrossRefGoogle Scholar
  2. 2.
    Blanz, V., Tarr, M.J., Bülthoff, H.H.: What object attributes determine canonical views? Perception 28(5), 575–599 (1999)CrossRefGoogle Scholar
  3. 3.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the BMVC (2014)Google Scholar
  4. 4.
    Chen, X., Saparov, A., Pang, B., Funkhouser, T.: Schelling points on 3D surface meshes. ACM Trans. Graph. (Proc. SIGGRAPH) 31(4), 29 (2012)CrossRefGoogle Scholar
  5. 5.
    Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: 1996 Proceedings of the SIGGRAPH, pp. 303–312 (1996)Google Scholar
  6. 6.
    Cutzu, F., Edelman, S.: Canonical views in object representation and recognition. Vis. Res. 34(22), 3037–3056 (1994)CrossRefGoogle Scholar
  7. 7.
    Dutagaci, H., Cheung, C.P., Godil, A.: A benchmark for best view selection of 3D objects. In: Proceedings of the ACM Workshop on 3DOR, pp. 45–50 (2010)Google Scholar
  8. 8.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  9. 9.
    Hayward, W.G.: Effects of outline shape in object recognition. J. Exp. Psychol. Hum. Percept. Perform. 24(2), 427 (1998)CrossRefGoogle Scholar
  10. 10.
    He, J., Wang, L., Zhou, W., Zhang, H., Cui, X., Guo, Y.: Viewpoint assessment and recommendation for photographing architectures. IEEE Trans. Vis. Comput. Graph 25, 2636–2649 (2018)CrossRefGoogle Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the ICCV, pp. 1026–1034 (2015)Google Scholar
  12. 12.
    Huang, H., Kalogerakis, E., Chaudhuri, S., Ceylan, D., Kim, V.G., Yumer, E.: Learning local shape descriptors from part correspondences with multi-view convolutional networks. ACM Trans. Graph. 37(1), 6 (2018)Google Scholar
  13. 13.
    Kalogerakis, E., Averkiou, M., Maji, S., Chaudhuri, S.: 3D shape segmentation with projective convolutional networks. In: Proceedings of the CVPR, vol. 1, p. 8 (2017)Google Scholar
  14. 14.
    Kim, S.h., Tai, Y.W., Lee, J.Y., Park, J., Kweon, I.S.: Category-specific salient view selection via deep convolutional neural networks. In: Computer Graphics Forum, vol. 36, pp. 313–328. Wiley Online Library (2017)Google Scholar
  15. 15.
    Koch, C., Poggio, T.: Predicting the visual world: silence is golden. Nat. Neurosci. 2, 9–10 (1999)CrossRefGoogle Scholar
  16. 16.
    Lee, C.H., Varshney, A., Jacobs, D.W.: Mesh saliency. ACM Trans. Graph. (Proc. SIGGRAPH) 24(3), 659–666 (2005)CrossRefGoogle Scholar
  17. 17.
    Leifman, G., Shtrom, E., Tal, A.: Surface regions of interest for viewpoint selection. IEEE Trans. Pattern Anal. Mach. Intell. 38(12), 2544–2556 (2016)CrossRefGoogle Scholar
  18. 18.
    Mezuman, E., Weiss, Y.: Learning about canonical views from internet image collections. In: Proceedings of the NIPS, pp. 719–727 (2012)Google Scholar
  19. 19.
    Novotny, D., Larlus, D., Vedaldi, A.: Learning 3D object categories by looking around them. In: Proceedings of the ICCV, October 2017Google Scholar
  20. 20.
    Page, D.L., Koschan, A.F., Sukumar, S.R., Roui-Abidi, B., Abidi, M.A.: Shape analysis algorithm based on information theory. In: Proceedings of the ICIP, vol. 1, p. I-229 (2003)Google Scholar
  21. 21.
    Polonsky, O., Patané, G., Biasotti, S., Gotsman, C., Spagnuolo, M.: What’s in an image? Vis. Comput. 21(8–10), 840–847 (2005)CrossRefGoogle Scholar
  22. 22.
    Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the CVPR, pp. 5648–5656 (2016)Google Scholar
  23. 23.
    Secord, A., Lu, J., Finkelstein, A., Singh, M., Nealen, A.: Perceptual models of viewpoint preference. ACM Trans. Graph. 30(5), 109 (2011)CrossRefGoogle Scholar
  24. 24.
    Shilane, P., Min, P., Kazhdan, M., Funkhouser, T.: The Princeton shape benchmark. In: Proceedings of Shape Modeling Applications (2004)Google Scholar
  25. 25.
    Song, R., Liu, Y., Rosin, P.L.: Distinction of 3D objects and scenes via classification network and Markov random field. IEEE Trans. Vis. Comput. Graph 26(6), 2204–2218 (2020) CrossRefGoogle Scholar
  26. 26.
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the ICCV, pp. 945–953 (2015)Google Scholar
  27. 27.
    Tarr, M.J., Pinker, S.: Mental rotation and orientation-dependence in shape recognition. Cogn. Psychol. 21(2), 233–282 (1989)CrossRefGoogle Scholar
  28. 28.
    Vázquez, P.P., Feixas, M., Sbert, M., Heidrich, W.: Viewpoint selection using viewpoint entropy. In: VMV, vol. 1, pp. 273–280 (2001)Google Scholar
  29. 29.
    Vieira, T., et al.: Learning good views through intelligent galleries. In: Computer Graphics Forum, vol. 28, pp. 717–726. Wiley Online Library (2009)Google Scholar
  30. 30.
  31. 31.
    Wolfe, J.M.: Guided search 2.0 A revised model of visual search. Psychon. Bull. Rev. 1(2), 202–238 (1994). Scholar
  32. 32.
    Wu, Z., et al.: 3D shapeNets: a deep representation for volumetric shapes. In: Proceedings of the CVPR, pp. 1912–1920 (2015)Google Scholar
  33. 33.
    Yamauchi, H., Saleem, W., Yoshizawa, S., Karni, Z., Belyaev, A., Seidel, H.P.: Towards stable and salient multi-view representation of 3D shapes. In: IEEE International Conference on Shape Modeling and Applications (2006)Google Scholar
  34. 34.
    Zhao, S., Ooi, W.T.: Modeling 3D synthetic view dissimilarity. Vis. Comput. 32(4), 429–443 (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Control Science and EngineeringShandong UniversityJinanChina
  2. 2.Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and EngineeringChinese Academy of SciencesNingboChina
  3. 3.Department of Computer ScienceEdge Hill UniversityOrmskirkUK

Personalised recommendations