View-relation constrained global representation learning for multi-view-based 3D object recognition

Xu, Ruchang; Mi, Qing; Ma, Wei; Zha, Hongbin

doi:10.1007/s10489-022-03949-8

View-relation constrained global representation learning for multi-view-based 3D object recognition

Published: 21 July 2022

Volume 53, pages 7741–7750, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ruchang Xu¹,
Qing Mi¹,
Wei Ma ORCID: orcid.org/0000-0001-9652-4260¹ &
…
Hongbin Zha²

309 Accesses
1 Citation
Explore all metrics

Abstract

Multi-view observations provide complementary clues for 3D object recognition, but also include redundant information that appears different across views due to view-dependent projection, light reflection and self-occlusions. This paper presents a view-relation constrained global representation network (VCGR-Net) for 3D object recognition that can mitigate the view interference problem at all phases, from view-level source feature generation to multi-view feature aggregation. Specifically, we determine inter-view relations via LSTM implicitly. Based on the relations, we construct a two-stage feature selection module to filter features at each view according to their importance to the global representation and their reliability as observations at specific views. The selected features are then aggregated by referring to intra- and inter-view spatial context to generate global representation for 3D object recognition. Experiments on the ModelNet40 and ModelNet10 datasets demonstrate that the proposed method can suppress view interference and therefore outperform state-of-the-art methods in 3D object recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Multi-view CNN for Salient View Selection of 3D Objects and Scenes

Unsupervised Multi-View CNN for Salient View Selection and 3D Interest Point Detection

Article 16 March 2022

Learning representative viewpoints in 3D shape recognition

Article 02 July 2021

References

Ma C, Guo Y, Yang J, An W (2019) Learning multi-view representation with LSTM for 3D shape recognition and retrieval. IEEE Trans Multimedia 21(5):1169–1182
Article Google Scholar
Chen K, Oldja R, Smolyanskiy N, Birchfield S, Popov A, Wehr D, Eden I, Pehserl J (2020) MVLIdarnet: real-time multi-class scene understanding for autonomous driving using multiple views. In: IEEE international conference on intelligent robots and systems
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. In: International conference on computer vision
Sedaghat N, Zolfaghari M, Amiri E, Brox T (2017) Orientation-boosted voxel nets for 3d object recognition. In: British machine vision conference
Wang C, Cheng M, Sohel F, Bennamoun M, Li J (2018) Normalnet: a voxel-based CNN for 3D object classification and retrieval. Neurocomputing 323:139–147
Article Google Scholar
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3D classification and segmentation. In: IEEE conference on computer vision and pattern recognition
Fujiwara K, Hashimoto T (2020) Neural implicit embedding for point cloud analysis. In: IEEE conference on computer vision and pattern recognition
Chen X, Liu L, Zhang L, Zhang H, Meng L, Liu D (2021) Group-pair deep feature learning for multi-view 3D model retrieval. Appl Intell. https://doi.org/10.1007/s10489-021-02471-7
Yu T, Meng J, Yuan J (2018) Multi-view harmonized bilinear network for 3d object recognition. In: IEEE conference on computer vision and pattern recognition
Liang Q, Li Q, Zhang L, Mi H, Nie W, Li X (2021) MHFP: multi-view based hierarchical fusion pooling method for 3D shape recognition. Pattern Recogn Lett 150:214–220
Article Google Scholar
Lee DH, Chen KL, Liou KH, Liu CL, Liu JL (2021) Deep learning and control algorithms of direct perception for autonomous driving. Appl Intell 51(1):237–247
Article Google Scholar
Han Z, Shang M, Liu Z, Vong CM, Liu YS, Zwicker M, Han J, Chen C (2019) Seqviews2seqlabels: learning 3D global features via aggregating sequential views by RNN with attention. IEEE Trans Image Process 28(2):658–672
Article MathSciNet MATH Google Scholar
Ullah A, Muhammad K, Ser JD, Baik SW, Albuquerque V (2019) Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Trans Ind Electron 66(12):9692–9702
Article Google Scholar
Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Arotation invariant spherical harmonic representation of 3D shape descriptors. Eurographics Symp Geom Process 6:156–164
Google Scholar
Chen DY, Tian XP, Shen YT, Ouhyoung M (2010) On visual similarity based 3d model retrieval. Comput Graph Forum 22(3):223–232
Article Google Scholar
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D ShapeNets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition
Maturana D, Scherer S (2015) Voxnet: a 3D convolutional neural network for real-time object recognition. In: IEEE international conference on intelligent robots and systems
Wang PS, Liu Y, Guo YX, Sun CY, Tong X (2017) O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans Graph 36(4):72
Article Google Scholar
Le T, Duan Y (2018) Pointgrid: a deep network for 3D shape understanding. In: IEEE conference on computer vision and pattern recognition
Qi CR, Yi L, Su H.Y, Guibas LJ (2017) Pointnet+ +: deep hierarchical feature learning on point sets in a metric space conference and workshop on neural information processing systems
Yan X, Zheng C, Li Z, Wang S, Cui S (2020) Pointasnl: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: IEEE conference on computer vision and pattern recognition
Yu T, Meng J, Yang M, Yuan J (2021) 3D object representation learning: a set-to-set matching perspective. IEEE Trans Image Process 30:2168–217
Article MathSciNet Google Scholar
Feng Y, Zhang Z, Zhao X, Ji R, Gao Y (2018) GVCNN: group-view convolutional neural networks for 3D shape recognition. In: IEEE conference on computer vision and pattern recognition
Yang Z, Wang L (2019) Learning relationships for multi-view 3D object recognition. In: IEEE international conference on computer vision
Xu J, Zhang X, Li W, Liu X, Han J (2021) Joint multi-view 2D convolutional neural networks for 3D object classification. In: International joint conference on artificial intelligence
Liu A-A, Zhou H, Nie W, Liu Z, Liu W, Xie H, Mao Z, Li X, Song D (2021) Hierarchical multi-view context modelling for 3D object classification and retrieval. Inf Sci 547:984–995
Article Google Scholar
Han Z, Lu H, Liu Z, Vong CM, Liua YS, Zwicker M, Han J, Chen CLP (2019) 3D2SeqViews: aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation. IEEE Trans Image Process 28(8):3986–3999
Article MathSciNet MATH Google Scholar
Jiang J, Bao D, Chen Z, Zhao X, Gao Y (2019) MLVCNN: multi-loop-view convolutional neural network for 3D shape retrieval. In: Proceedings of the AAAI conference on artificial intelligence
Huang J, Yan W, Li TH, Liu S, Li G (2020) Learning the global descriptor for 3D object recognition based on multiple views decomposition. IEEE Trans Multimedia 24:188–201
Article Google Scholar
Shao Z, Li Y, Zhang H (2020) Learning representations from skeletal self-similarities for cross-view action recognition. IEEE Trans Circuits Syst Video Technol 31(1):160–174
Article Google Scholar
Liu M, Li Y, Liu H (2021) Robust 3D gaze estimation via data optimization and saliency aggregation for mobile eye-tracking systems. IEEE Trans Instrum Meas 70:1–10
Article Google Scholar
Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) ARHPE: Asymmetric relation-aware representation learning for head pose estimation in industrial human-machine interaction. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2022.3143605
Ma W, Xu S, Ma W, Zha H (2020) Multiview feature aggregation for facade parsing. IEEE Geosci Remote Sens Lett 19:1–5
Google Scholar
Ren Z, Sun Q (2021) Simultaneous global and local graph structure preserving for multiple kernel clustering. IEEE Trans Neural Netw Learn Syst 32(5):1839–1851
Article MathSciNet Google Scholar
Ren Z, Yang S, Sun Q, Wang T (2018) Consensus affinity graph learning for multiple kernel clustering. IEEE Trans Cybern 51(6):3273–3284
Article Google Scholar
Woo S, Park J, Lee J, Kweon I (2018) Cbam: convolutional block attention module. In: European conference on computer vision

Download references

Acknowledgements

This research is partially supported by National Natural Science Foundation of China (Nos. 62176010, 61771026). It is also supported by the Key Project of Beijing Municipal Education Commission (No. KZ201910005008).

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100020, China
Ruchang Xu, Qing Mi & Wei Ma
Key Laboratory of Machine Perception (MOE), School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Hongbin Zha

Authors

Ruchang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qing Mi
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Zha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Ma.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, R., Mi, Q., Ma, W. et al. View-relation constrained global representation learning for multi-view-based 3D object recognition. Appl Intell 53, 7741–7750 (2023). https://doi.org/10.1007/s10489-022-03949-8

Download citation

Accepted: 30 June 2022
Published: 21 July 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10489-022-03949-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

View-relation constrained global representation learning for multi-view-based 3D object recognition

Abstract

Access this article

Similar content being viewed by others

Unsupervised Multi-view CNN for Salient View Selection of 3D Objects and Scenes

Unsupervised Multi-View CNN for Salient View Selection and 3D Interest Point Detection

Learning representative viewpoints in 3D shape recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

View-relation constrained global representation learning for multi-view-based 3D object recognition

Abstract

Access this article

Similar content being viewed by others

Unsupervised Multi-view CNN for Salient View Selection of 3D Objects and Scenes

Unsupervised Multi-View CNN for Salient View Selection and 3D Interest Point Detection

Learning representative viewpoints in 3D shape recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation