Skip to main content
Log in

View-relation constrained global representation learning for multi-view-based 3D object recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-view observations provide complementary clues for 3D object recognition, but also include redundant information that appears different across views due to view-dependent projection, light reflection and self-occlusions. This paper presents a view-relation constrained global representation network (VCGR-Net) for 3D object recognition that can mitigate the view interference problem at all phases, from view-level source feature generation to multi-view feature aggregation. Specifically, we determine inter-view relations via LSTM implicitly. Based on the relations, we construct a two-stage feature selection module to filter features at each view according to their importance to the global representation and their reliability as observations at specific views. The selected features are then aggregated by referring to intra- and inter-view spatial context to generate global representation for 3D object recognition. Experiments on the ModelNet40 and ModelNet10 datasets demonstrate that the proposed method can suppress view interference and therefore outperform state-of-the-art methods in 3D object recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Ma C, Guo Y, Yang J, An W (2019) Learning multi-view representation with LSTM for 3D shape recognition and retrieval. IEEE Trans Multimedia 21(5):1169–1182

    Article  Google Scholar 

  2. Chen K, Oldja R, Smolyanskiy N, Birchfield S, Popov A, Wehr D, Eden I, Pehserl J (2020) MVLIdarnet: real-time multi-class scene understanding for autonomous driving using multiple views. In: IEEE international conference on intelligent robots and systems

  3. Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. In: International conference on computer vision

  4. Sedaghat N, Zolfaghari M, Amiri E, Brox T (2017) Orientation-boosted voxel nets for 3d object recognition. In: British machine vision conference

  5. Wang C, Cheng M, Sohel F, Bennamoun M, Li J (2018) Normalnet: a voxel-based CNN for 3D object classification and retrieval. Neurocomputing 323:139–147

    Article  Google Scholar 

  6. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3D classification and segmentation. In: IEEE conference on computer vision and pattern recognition

  7. Fujiwara K, Hashimoto T (2020) Neural implicit embedding for point cloud analysis. In: IEEE conference on computer vision and pattern recognition

  8. Chen X, Liu L, Zhang L, Zhang H, Meng L, Liu D (2021) Group-pair deep feature learning for multi-view 3D model retrieval. Appl Intell. https://doi.org/10.1007/s10489-021-02471-7

  9. Yu T, Meng J, Yuan J (2018) Multi-view harmonized bilinear network for 3d object recognition. In: IEEE conference on computer vision and pattern recognition

  10. Liang Q, Li Q, Zhang L, Mi H, Nie W, Li X (2021) MHFP: multi-view based hierarchical fusion pooling method for 3D shape recognition. Pattern Recogn Lett 150:214–220

    Article  Google Scholar 

  11. Lee DH, Chen KL, Liou KH, Liu CL, Liu JL (2021) Deep learning and control algorithms of direct perception for autonomous driving. Appl Intell 51(1):237–247

    Article  Google Scholar 

  12. Han Z, Shang M, Liu Z, Vong CM, Liu YS, Zwicker M, Han J, Chen C (2019) Seqviews2seqlabels: learning 3D global features via aggregating sequential views by RNN with attention. IEEE Trans Image Process 28(2):658–672

    Article  MathSciNet  MATH  Google Scholar 

  13. Ullah A, Muhammad K, Ser JD, Baik SW, Albuquerque V (2019) Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Trans Ind Electron 66(12):9692–9702

    Article  Google Scholar 

  14. Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Arotation invariant spherical harmonic representation of 3D shape descriptors. Eurographics Symp Geom Process 6:156–164

    Google Scholar 

  15. Chen DY, Tian XP, Shen YT, Ouhyoung M (2010) On visual similarity based 3d model retrieval. Comput Graph Forum 22(3):223–232

    Article  Google Scholar 

  16. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D ShapeNets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition

  17. Maturana D, Scherer S (2015) Voxnet: a 3D convolutional neural network for real-time object recognition. In: IEEE international conference on intelligent robots and systems

  18. Wang PS, Liu Y, Guo YX, Sun CY, Tong X (2017) O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans Graph 36(4):72

    Article  Google Scholar 

  19. Le T, Duan Y (2018) Pointgrid: a deep network for 3D shape understanding. In: IEEE conference on computer vision and pattern recognition

  20. Qi CR, Yi L, Su H.Y, Guibas LJ (2017) Pointnet+ +: deep hierarchical feature learning on point sets in a metric space conference and workshop on neural information processing systems

  21. Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: IEEE conference on computer vision and pattern recognition

  22. Yu T, Meng J, Yang M, Yuan J (2021) 3D object representation learning: a set-to-set matching perspective. IEEE Trans Image Process 30:2168–217

    Article  MathSciNet  Google Scholar 

  23. Feng Y, Zhang Z, Zhao X, Ji R, Gao Y (2018) GVCNN: group-view convolutional neural networks for 3D shape recognition. In: IEEE conference on computer vision and pattern recognition

  24. Yang Z, Wang L (2019) Learning relationships for multi-view 3D object recognition. In: IEEE international conference on computer vision

  25. Xu J, Zhang X, Li W, Liu X, Han J (2021) Joint multi-view 2D convolutional neural networks for 3D object classification. In: International joint conference on artificial intelligence

  26. Liu A-A, Zhou H, Nie W, Liu Z, Liu W, Xie H, Mao Z, Li X, Song D (2021) Hierarchical multi-view context modelling for 3D object classification and retrieval. Inf Sci 547:984–995

    Article  Google Scholar 

  27. Han Z, Lu H, Liu Z, Vong CM, Liua YS, Zwicker M, Han J, Chen CLP (2019) 3D2SeqViews: aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation. IEEE Trans Image Process 28(8):3986–3999

    Article  MathSciNet  MATH  Google Scholar 

  28. Jiang J, Bao D, Chen Z, Zhao X, Gao Y (2019) MLVCNN: multi-loop-view convolutional neural network for 3D shape retrieval. In: Proceedings of the AAAI conference on artificial intelligence

  29. Huang J, Yan W, Li TH, Liu S, Li G (2020) Learning the global descriptor for 3D object recognition based on multiple views decomposition. IEEE Trans Multimedia 24:188–201

    Article  Google Scholar 

  30. Shao Z, Li Y, Zhang H (2020) Learning representations from skeletal self-similarities for cross-view action recognition. IEEE Trans Circuits Syst Video Technol 31(1):160–174

    Article  Google Scholar 

  31. Liu M, Li Y, Liu H (2021) Robust 3D gaze estimation via data optimization and saliency aggregation for mobile eye-tracking systems. IEEE Trans Instrum Meas 70:1–10

    Article  Google Scholar 

  32. Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) ARHPE: Asymmetric relation-aware representation learning for head pose estimation in industrial human-machine interaction. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2022.3143605

  33. Ma W, Xu S, Ma W, Zha H (2020) Multiview feature aggregation for facade parsing. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  34. Ren Z, Sun Q (2021) Simultaneous global and local graph structure preserving for multiple kernel clustering. IEEE Trans Neural Netw Learn Syst 32(5):1839–1851

    Article  MathSciNet  Google Scholar 

  35. Ren Z, Yang S, Sun Q, Wang T (2018) Consensus affinity graph learning for multiple kernel clustering. IEEE Trans Cybern 51(6):3273–3284

    Article  Google Scholar 

  36. Woo S, Park J, Lee J, Kweon I (2018) Cbam: convolutional block attention module. In: European conference on computer vision

Download references

Acknowledgements

This research is partially supported by National Natural Science Foundation of China (Nos. 62176010, 61771026). It is also supported by the Key Project of Beijing Municipal Education Commission (No. KZ201910005008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Ma.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, R., Mi, Q., Ma, W. et al. View-relation constrained global representation learning for multi-view-based 3D object recognition. Appl Intell 53, 7741–7750 (2023). https://doi.org/10.1007/s10489-022-03949-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03949-8

Keywords

Navigation