Skip to main content
Log in

3D model retrieval based on multi-view attentional convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We propose a discriminative Multi-View Attentional Convolutional Neural Network, dubbed as MVA-CNN, which takes the multiple views of an shape as input and output the object category. Unlike previous view-based approaches that simply ”compile” the view features into a compact 3D descriptors, our method can discover the context among multiple views in both the visual and spatial domain. First, we extract multiple rendered images from a 3D object by virtual cameras, and then we use Convolutional Neural Network (CNN) to abstract the information of the views. Second, we aggregate the visual views by two steps: 1). an element-wise maximum operation across the view features is adopted to discover discriminative features. 2). a soft attention mechanism is used to dynamically adjust the shape descriptors for better representing the spatial information. The entire network can be trained in an end-to-end way with the standard backpropagation. We verify the effectiveness of MVA-CNN on two widely used datasets: ModelNet10, ModelNet40 by comparing our method with state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  2. Bai S, Bai X, Zhou Z, Zhang Z, Latecki LJ (2016) GIFT: a real-time and scalable 3d shape search engine. In: CVPR 2016, Las vegas, NV, USA, June 27-30, 2016, pp 5023–5032

  3. Bosche F, Haas CT (2008) Automated retrieval of 3d cad model objects in construction range images. Autom Constr 17(4):499–512

    Article  Google Scholar 

  4. Cheng Z, Chang X, Zhu L, Catherine Kanjirathinkal R, Kankanhalli MS (2018) MMALFM: explainable recommendation by leveraging reviews and images. arXiv:1811.05318

  5. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR 2005, 20-26 June 2005, San Diego, CA, USA, pp 886–893

  6. Gao Y, Wang M, Tao D, Ji R, Dai Q (2012) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Processing 21(9):4290–4303

    Article  MathSciNet  Google Scholar 

  7. Gao Y, Zhang H, Zhao X, Yan S (2017) Event classification in microblogs via social tracking. ACM TIST 8(3):35:1–35:14

    Google Scholar 

  8. Gao Y, Zhen Y, Li H, Chua T (2016) Filtering of brand-related microblogs using social-smooth multiview embedding. IEEE Trans Multimedia 18(10):2115–2126

    Article  Google Scholar 

  9. Garcia-Garcia A, Gomez-Donoso F, Rodríguez JG, Orts-Escolano S, Cazorla M, López JA (2016) Pointnet: a 3d convolutional neural network for real-time object class recognition. In: IJCNN 2016, Vancouver, BC, Canada, July 24–29, 2016, pp 1578–1584

  10. Guétat G, Maitre M, Joly L, Lai SL, Lee T, Shinagawa Y (2006) Automatic 3-d grayscale volume matching and shape analysis. IEEE Trans Inf Technol Biomed 10(2):362

    Article  Google Scholar 

  11. Hilaga M, Shinagawa Y, Komura T, Kunii TL (2001) Topology matching for fully automatic similarity estimation of 3d shapes. In: SIGGRAPH 2001, Los Angeles, California, USA, August 12–17, 2001, pp 203–212

  12. Ip CY, Lapadat D, Sieger L, Regli WC (2002) Using shape distributions to compare solid models. In: Seventh ACM symposium on solid modeling and applications, max-planck-institut für informatik, saarbrücken, Germany, June 17–21, 2002, pp 273–280

  13. Johns E, Leutenegger S, Davison AJ (2016) Pairwise decomposition of image sequences for active multi-view recognition. In: CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp 3813–3822

  14. Kanezaki A (2016) Rotationnet: learning object classification using unsupervised viewpoint estimation. arXiv:1603.06208

  15. Kim W, Kim Y (2000) A region-based shape descriptor using zernike moments. Sig Proc Image Comm 16(1–2):95–102

    Article  Google Scholar 

  16. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  17. Little JJ (1985) Determining object attitude from extended gaussian images. In: Proceedings of the 9th international joint conference on artificial intelligence. Los Angeles, CA, USA, August 1985, pp 960–963

  18. Liu A, Nie W, Gao Y, Su Y (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116

    Article  MathSciNet  Google Scholar 

  19. Liu A, Nie W, Gao Y, Su Y (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Cybernetics 48(3):916–928

    Google Scholar 

  20. Liu A, Xu N, Nie W, Su Y, Zhang Y (2019) Multi-domain and multi-task learning for human action recognition. IEEE Trans Image Process 28(2):853–867

    Article  MathSciNet  Google Scholar 

  21. Liu S, Giles CL, Ororbia A (2018) Learning a hierarchical latent-variable model of 3d shapes. In: 3DV pp 542–551

  22. Liu W, Gao Y, Ma H, Yu S, Nie J (2017) Online multi-objective optimization for live video forwarding across video data centers. J Vis Commun Image Represent 48:502–513

    Article  Google Scholar 

  23. Liu W, Zhang C, Ma H, Li S (2018) Learning efficient spatial-temporal gait features with deep learning for human identification. Neuroinformatics 16(3–4):457–471

    Article  Google Scholar 

  24. Liu X, Liu W, Mei T, Ma H (2018) PROVID: progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans Multimedia 20 (3):645–658

    Article  Google Scholar 

  25. Ma H, Liu W (2018) A progressive search paradigm for the internet of things. IEEE MultiMedia 25(1):76–86

    Article  MathSciNet  Google Scholar 

  26. Makadia A, Daniilidis K (2010) Spherical correlation of visual representations for 3d model retrieval. Int J Comput Vis 89(2-3):193–210

    Article  Google Scholar 

  27. Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recognition. In: IROS 2015, Hamburg, Germany, September 28 - October 2, 2015, pp 922–928

  28. Ng PC, Henikoff S (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31(13):3812–3814

    Article  Google Scholar 

  29. Phong BT (1975) Illumination for computer generated pictures. Commun ACM 18(6):311–317

    Article  Google Scholar 

  30. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS 2017, 4-9 December 2017, Long Beach, CA, USA, pp 5105–5114

  31. Ren M, Niu L, Fang Y (2017) 3d-a-nets: 3d deep dense descriptor for volumetric shapes with adversarial networks. arXiv:1711.10108

  32. Sfikas K, Theoharis T, Pratikakis I (2017) Exploiting the PANORAMA representation for convolutional neural network classification and retrieval. In: Eurographics workshop on 3d object retrieval, 3DOR 2017, Lyon, France, April 23-24, 2017

  33. Shi B, Bai S, Zhou Z, Bai X (2015) Deeppano: deep panoramic representation for 3-d shape recognition. IEEE Signal Process Lett 22(12):2339–2343

    Article  Google Scholar 

  34. Siddiqi K, Zhang J, Macrini D, Shokoufandeh A, Bouix S, Dickinson SJ (2008) Retrieving articulated 3-d models using medial surfaces. Mach Vis Appl 19(4):261–275

    Article  Google Scholar 

  35. Su H, Maji S, Kalogerakis E, Learned-Miller EG (2015) Multi-view convolutional neural networks for 3d shape recognition. In: ICCV 2015, Santiago, Chile, December 7–13, 2015, pp 945–953

  36. Tabia H, Laga H (2015) Covariance-based descriptors for efficient 3d shape matching, retrieval, and classification. IEEE Trans Multimedia 17(9):1591–1603

    Article  Google Scholar 

  37. Tangelder JWH, Veltkamp RC (2003) Polyhedral model retrieval using weighted point sets. Int J Image Graphics 3(1):209

    Article  Google Scholar 

  38. Wang X, Nie W (2015) 3d model retrieval with weighted locality-constrained group sparse coding. Neurocomputing 151:620–625

    Article  Google Scholar 

  39. Wong HS, Ma B, Yu Z, Yeung PF, Ip HHS (2007) 3-d head model retrieval using a single face view query. IEEE Trans on Multimedia 9(5):1026–1036

    Article  Google Scholar 

  40. Wu J, Zhang C, Xue T, Freeman B, Tenenbaum J (2016) Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In: NIPS 2016, December 5–10, 2016, Barcelona, Spain, pp 82–90

  41. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp 1912–1920

  42. Xie J, Dai G, Zhu F, Wong EK, Fang Y (2017) Deepshape: deep-learned shape descriptor for 3d shape retrieval. IEEE Trans Pattern Anal Mach Intell 39(7):1335–1345

    Article  Google Scholar 

  43. Xie J, Zheng Z, Gao R, Wang W, Zhu S, Wu YN (2018) Learning descriptor networks for 3d shape synthesis and analysis. arXiv:1804.00586

  44. Zanuttigh P, Minto L (2017) Deep learning for 3d shape classification from multiple depth maps. In: ICIP 2017, Beijing, China, September 17–20, 2017, pp 3615–3619

  45. Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv:1409.2329

  46. Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3d depth data matching towards robust action retrieval. Neurocomputing 151:533–543

    Article  Google Scholar 

  47. Zhao X, Wang N, Zhang Y, Du S, Gao Y, Sun J (2017) Beyond pairwise matching: person reidentification via high-order relevance learning. IEEE Trans Neural Netw Learn Syst PP(99):1–14

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to An-An Liu or Wei-Zhi Nie.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the National Natural Science Foundation of China (61772359,61572356,61872267),the grant of Tianjin New Generation Artificial Intelligence Major Program (18ZXZNGX00150), the grant of Elite Scholar Program of Tianjin University (2019XRX-0035).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, AA., Zhou, HY., Li, MJ. et al. 3D model retrieval based on multi-view attentional convolutional neural network. Multimed Tools Appl 79, 4699–4711 (2020). https://doi.org/10.1007/s11042-019-7521-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7521-8

Keywords

Navigation