Abstract
We propose a discriminative Multi-View Attentional Convolutional Neural Network, dubbed as MVA-CNN, which takes the multiple views of an shape as input and output the object category. Unlike previous view-based approaches that simply ”compile” the view features into a compact 3D descriptors, our method can discover the context among multiple views in both the visual and spatial domain. First, we extract multiple rendered images from a 3D object by virtual cameras, and then we use Convolutional Neural Network (CNN) to abstract the information of the views. Second, we aggregate the visual views by two steps: 1). an element-wise maximum operation across the view features is adopted to discover discriminative features. 2). a soft attention mechanism is used to dynamically adjust the shape descriptors for better representing the spatial information. The entire network can be trained in an end-to-end way with the standard backpropagation. We verify the effectiveness of MVA-CNN on two widely used datasets: ModelNet10, ModelNet40 by comparing our method with state-of-the-art methods.
Similar content being viewed by others
References
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Bai S, Bai X, Zhou Z, Zhang Z, Latecki LJ (2016) GIFT: a real-time and scalable 3d shape search engine. In: CVPR 2016, Las vegas, NV, USA, June 27-30, 2016, pp 5023–5032
Bosche F, Haas CT (2008) Automated retrieval of 3d cad model objects in construction range images. Autom Constr 17(4):499–512
Cheng Z, Chang X, Zhu L, Catherine Kanjirathinkal R, Kankanhalli MS (2018) MMALFM: explainable recommendation by leveraging reviews and images. arXiv:1811.05318
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR 2005, 20-26 June 2005, San Diego, CA, USA, pp 886–893
Gao Y, Wang M, Tao D, Ji R, Dai Q (2012) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Processing 21(9):4290–4303
Gao Y, Zhang H, Zhao X, Yan S (2017) Event classification in microblogs via social tracking. ACM TIST 8(3):35:1–35:14
Gao Y, Zhen Y, Li H, Chua T (2016) Filtering of brand-related microblogs using social-smooth multiview embedding. IEEE Trans Multimedia 18(10):2115–2126
Garcia-Garcia A, Gomez-Donoso F, Rodríguez JG, Orts-Escolano S, Cazorla M, López JA (2016) Pointnet: a 3d convolutional neural network for real-time object class recognition. In: IJCNN 2016, Vancouver, BC, Canada, July 24–29, 2016, pp 1578–1584
Guétat G, Maitre M, Joly L, Lai SL, Lee T, Shinagawa Y (2006) Automatic 3-d grayscale volume matching and shape analysis. IEEE Trans Inf Technol Biomed 10(2):362
Hilaga M, Shinagawa Y, Komura T, Kunii TL (2001) Topology matching for fully automatic similarity estimation of 3d shapes. In: SIGGRAPH 2001, Los Angeles, California, USA, August 12–17, 2001, pp 203–212
Ip CY, Lapadat D, Sieger L, Regli WC (2002) Using shape distributions to compare solid models. In: Seventh ACM symposium on solid modeling and applications, max-planck-institut für informatik, saarbrücken, Germany, June 17–21, 2002, pp 273–280
Johns E, Leutenegger S, Davison AJ (2016) Pairwise decomposition of image sequences for active multi-view recognition. In: CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 3813–3822
Kanezaki A (2016) Rotationnet: learning object classification using unsupervised viewpoint estimation. arXiv:1603.06208
Kim W, Kim Y (2000) A region-based shape descriptor using zernike moments. Sig Proc Image Comm 16(1–2):95–102
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Little JJ (1985) Determining object attitude from extended gaussian images. In: Proceedings of the 9th international joint conference on artificial intelligence. Los Angeles, CA, USA, August 1985, pp 960–963
Liu A, Nie W, Gao Y, Su Y (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Liu A, Nie W, Gao Y, Su Y (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Cybernetics 48(3):916–928
Liu A, Xu N, Nie W, Su Y, Zhang Y (2019) Multi-domain and multi-task learning for human action recognition. IEEE Trans Image Process 28(2):853–867
Liu S, Giles CL, Ororbia A (2018) Learning a hierarchical latent-variable model of 3d shapes. In: 3DV pp 542–551
Liu W, Gao Y, Ma H, Yu S, Nie J (2017) Online multi-objective optimization for live video forwarding across video data centers. J Vis Commun Image Represent 48:502–513
Liu W, Zhang C, Ma H, Li S (2018) Learning efficient spatial-temporal gait features with deep learning for human identification. Neuroinformatics 16(3–4):457–471
Liu X, Liu W, Mei T, Ma H (2018) PROVID: progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans Multimedia 20 (3):645–658
Ma H, Liu W (2018) A progressive search paradigm for the internet of things. IEEE MultiMedia 25(1):76–86
Makadia A, Daniilidis K (2010) Spherical correlation of visual representations for 3d model retrieval. Int J Comput Vis 89(2-3):193–210
Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: IROS 2015, Hamburg, Germany, September 28 - October 2, 2015, pp 922–928
Ng PC, Henikoff S (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31(13):3812–3814
Phong BT (1975) Illumination for computer generated pictures. Commun ACM 18(6):311–317
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS 2017, 4-9 December 2017, Long Beach, CA, USA, pp 5105–5114
Ren M, Niu L, Fang Y (2017) 3d-a-nets: 3d deep dense descriptor for volumetric shapes with adversarial networks. arXiv:1711.10108
Sfikas K, Theoharis T, Pratikakis I (2017) Exploiting the PANORAMA representation for convolutional neural network classification and retrieval. In: Eurographics workshop on 3d object retrieval, 3DOR 2017, Lyon, France, April 23-24, 2017
Shi B, Bai S, Zhou Z, Bai X (2015) Deeppano: deep panoramic representation for 3-d shape recognition. IEEE Signal Process Lett 22(12):2339–2343
Siddiqi K, Zhang J, Macrini D, Shokoufandeh A, Bouix S, Dickinson SJ (2008) Retrieving articulated 3-d models using medial surfaces. Mach Vis Appl 19(4):261–275
Su H, Maji S, Kalogerakis E, Learned-Miller EG (2015) Multi-view convolutional neural networks for 3d shape recognition. In: ICCV 2015, Santiago, Chile, December 7–13, 2015, pp 945–953
Tabia H, Laga H (2015) Covariance-based descriptors for efficient 3d shape matching, retrieval, and classification. IEEE Trans Multimedia 17(9):1591–1603
Tangelder JWH, Veltkamp RC (2003) Polyhedral model retrieval using weighted point sets. Int J Image Graphics 3(1):209
Wang X, Nie W (2015) 3d model retrieval with weighted locality-constrained group sparse coding. Neurocomputing 151:620–625
Wong HS, Ma B, Yu Z, Yeung PF, Ip HHS (2007) 3-d head model retrieval using a single face view query. IEEE Trans on Multimedia 9(5):1026–1036
Wu J, Zhang C, Xue T, Freeman B, Tenenbaum J (2016) Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In: NIPS 2016, December 5–10, 2016, Barcelona, Spain, pp 82–90
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 1912–1920
Xie J, Dai G, Zhu F, Wong EK, Fang Y (2017) Deepshape: deep-learned shape descriptor for 3d shape retrieval. IEEE Trans Pattern Anal Mach Intell 39(7):1335–1345
Xie J, Zheng Z, Gao R, Wang W, Zhu S, Wu YN (2018) Learning descriptor networks for 3d shape synthesis and analysis. arXiv:1804.00586
Zanuttigh P, Minto L (2017) Deep learning for 3d shape classification from multiple depth maps. In: ICIP 2017, Beijing, China, September 17–20, 2017, pp 3615–3619
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv:1409.2329
Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3d depth data matching towards robust action retrieval. Neurocomputing 151:533–543
Zhao X, Wang N, Zhang Y, Du S, Gao Y, Sun J (2017) Beyond pairwise matching: person reidentification via high-order relevance learning. IEEE Trans Neural Netw Learn Syst PP(99):1–14
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by the National Natural Science Foundation of China (61772359,61572356,61872267),the grant of Tianjin New Generation Artificial Intelligence Major Program (18ZXZNGX00150), the grant of Elite Scholar Program of Tianjin University (2019XRX-0035).
Rights and permissions
About this article
Cite this article
Liu, AA., Zhou, HY., Li, MJ. et al. 3D model retrieval based on multi-view attentional convolutional neural network. Multimed Tools Appl 79, 4699–4711 (2020). https://doi.org/10.1007/s11042-019-7521-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7521-8