ARM3D: Attention-based relation module for indoor 3D object detection

Lan, Yuqing; Duan, Yao; Liu, Chenyi; Zhu, Chenyang; Xiong, Yueshan; Huang, Hui; Xu, Kai

doi:10.1007/s41095-021-0252-6

ARM3D: Attention-based relation module for indoor 3D object detection

Research Article
Open access
Published: 08 March 2022

Volume 8, pages 395–414, (2022)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

ARM3D: Attention-based relation module for indoor 3D object detection

Download PDF

Yuqing Lan¹,
Yao Duan¹,
Chenyi Liu¹,
Chenyang Zhu¹,
Yueshan Xiong¹,
Hui Huang² &
…
Kai Xu¹

1586 Accesses
9 Citations
Explore all metrics

Abstract

Relation contexts have been proved to be useful for many challenging vision tasks. In the field of 3D object detection, previous methods have been taking the advantage of context encoding, graph embedding, or explicit relation reasoning to extract relation contexts. However, there exist inevitably redundant relation contexts due to noisy or low-quality proposals. In fact, invalid relation contexts usually indicate underlying scene misunderstanding and ambiguity, which may, on the contrary, reduce the performance in complex scenes. Inspired by recent attention mechanism like Transformer, we propose a novel 3D attention-based relation module (ARM3D). It encompasses object-aware relation reasoning to extract pair-wise relation contexts among qualified proposals and an attention module to distribute attention weights towards different relation contexts. In this way, ARM3D can take full advantage of the useful relation contexts and filter those less relevant or even confusing contexts, which mitigates the ambiguity in detection. We have evaluated the effectiveness of ARM3D by plugging it into several state-of-the-art 3D object detectors and showing more accurate and robust detection results. Extensive experiments show the capability and generalization of ARM3D on 3D object detection. Our source code is available at https://github.com/lanlan96/ARM3D.

Article PDF

Improving 3D Object Detection with Context-Aware and Dimensional Interaction Attention

Article Open access 09 February 2024

Cross-scale Dynamic Relation Network for Object Detection

Instance-level Object relation module for one-stage Object Detection

Article 04 February 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Charles, R. Q.; Hao, S.; Mo, K. C.; Guibas, L. J. PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 77–85, 2017.
Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. PointCNN: Convolution on X-transformed points. In: Proceedings of the 32nd Conference on Neural Information Processing Systems, 820–830, 2018.
Qi, C. R.; Yi, L.; Su, H.; Guibas, L. J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 5099–5108, 2017.
Wu, W. X.; Qi, Z. A.; Li, F. X. PointConv: Deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9613–9622, 2019.
Yi, L.; Zhao, W.; Wang, H.; Sung, M.; Guibas, L. J. GSPN: Generative shape proposal network for 3D instance segmentation in point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3942–3951, 2019.
Qi, C. R.; Litany, O.; He, K. M.; Guibas, L. Deep Hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9276–9285, 2019.
Xie, Q.; Lai, Y. K.; Wu, J.; Wang, Z. T.; Zhang, Y. M.; Xu, K.; Wang, J. MLCVNet: Multi-level context VoteNet for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10444–10453, 2020.
Zhang, Z.; Sun, B.; Yang, H.; Huang, Q. H3DNet: 3D object detection using hybrid geometric primitives. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12357. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 311–329, 2020.
Chapter Google Scholar
Cheng, B. W.; Sheng, L.; Shi, S. S.; Yang, M.; Xu, D. Back-tracing representative points for voting-based 3D object detection in point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8959–8968, 2021.
Lan, Y. Q.; Duan, Y.; Shi, Y. F.; Huang, H.; Xu, K. 3DRM: Pair-wise relation module for 3D object detection. Computers & Graphics Vol. 98, 58–70, 2021.
Article Google Scholar
Shi, Y. F.; Long, P. X.; Xu, K.; Huang, H.; Xiong, Y. S. Data-driven contextual modeling for 3D scene understanding. Computers & Graphics Vol. 55, 55–67, 2016.
Article Google Scholar
Qi, X. J.; Liao, R. J.; Jia, J. Y.; Fidler, S.; Urtasun, R. 3D graph neural networks for RGBD semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 5209–5218, 2017.
Zhang, Y.; Bai, M.; Kohli, P.; Izadi, S.; Xiao, J. DeepContext: Context-encoding neural pathways for 3D holistic scene understanding. In: Proceedings of the IEEE International Conference on Computer Vision, 1201–1210, 2017.
Hu, H.; Gu, J. Y.; Zhang, Z.; Dai, J. F.; Wei, Y. C. Relation networks for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3588–3597, 2018.
Xu, H.; Jiang, C. H.; Liang, X. D.; Li, Z. G. Spatial-aware graph relation network for large-scale object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9290–9299, 2019.
Dai, A.; Chang, A. X.; Savva, M.; Halber, M.; Funkhouser, T.; Niessner, M. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2432–2443, 2017.
Song, S. R.; Lichtenberg, S. P.; Xiao, J. X. SUN RGB-D: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 567–576, 2015.
Lin, D. H.; Fidler, S.; Urtasun, R. Holistic scene understanding for 3D object detection with RGBD cameras. In: Proceedings of the IEEE International Conference on Computer Vision, 1417–1424, 2013.
Shi, Y. F.; Chang, A. X.; Wu, Z. L.; Savva, M.; Xu, K. Hierarchy denoising recursive autoencoders for 3D scene layout prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1771–1780, 2019.
Chen, J. T.; Lei, B. W.; Song, Q. Y.; Ying, H. C.; Chen, D. Z.; Wu, J. A hierarchical graph network for 3D object detection on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 392–401, 2020.
Qi, C. R.; Liu, W.; Wu, C. X.; Su, H.; Guibas, L. J. Frustum PointNets for 3D object detection from RGB-D data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 918–927, 2018.
Chen, X. Z.; Ma, H. M.; Wan, J.; Li, B.; Xia, T. Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1907–1915, 2017.
Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S. L. Joint 3D proposal generation and object detection from view aggregation. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 1–8, 2018.
Shi, S. S.; Wang, X. G.; Li, H. S. PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 770–779, 2019.
Wang, P.-S.; Liu, Y.; Guo, Y.-X.; Sun, C.-Y.; Tong, X. O-CNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 72, 2017.
Atzmon, M.; Maron, H.; Lipman, Y. Point convolutional neural networks by extension operators. arXiv preprint arXiv:1803.10091, 2018.
Yan, Y.; Mao, Y. X.; Li, B. SECOND: Sparsely embedded convolutional detection. Sensors (Basel) Vol. 18, No. 10, 3337, 2018.
Article Google Scholar
Lang, A. H.; Vora, S.; Caesar, H.; Zhou, L. B.; Yang, J.; Beijbom, O. PointPillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12689–12697, 2019.
Shi, S. S.; Wang, Z.; Shi, J. P.; Wang, X. G.; Li, H. S. From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 8, 2647–2664, 2021.
Google Scholar
Pang, G.; Neumann, U. 3D point cloud object detection with multi-view convolutional neural network. In: Proceedings of the 23rd International Conference on Pattern Recognition, 585–590, 2016.
Lahoud, J.; Ghanem, B. 2D-driven 3D object detection in RGB-D images. In: Proceedings of the IEEE International Conference on Computer Vision, 4632–4640, 2017.
Ren, S. Q.; He, K. M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, 91–99, 2015.
Yang, Z. T.; Sun, Y. N.; Liu, S.; Jia, J. Y. 3DSSD: Point-based 3D single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11037–11045, 2020.
Engelmann, F.; Bokeloh, M.; Fathi, A.; Leibe, B.; NieBner, M. 3D-MPA: Multi-proposal aggregation for 3D semantic instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9028–9037, 2020.
Huang, S.; Qi, S.; Xiao, Y.; Zhu, Y.; Wu, Y. N.; Zhu, S.-C. Cooperative holistic scene understanding: Unifying 3D object, layout, and camera pose estimation. In: Proceedings of the 32nd Conference on Neural Information Processing System, 207–218, 2018.
Santoro, A.; Raposo, D.; Barrett, D. G.; Malinowski, M.; Pascanu, R.; Battaglia, P.; Lillicrap, T. A simple neural network module for relational reasoning. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 4967–4976, 2017.
Mou, L. C.; Hua, Y. S.; Zhu, X. X. A relation-augmented fully convolutional network for semantic segmentation in aerial scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12408–12417, 2019.
Li, X.; Yang, Y. B.; Zhao, Q. J.; Shen, T. C.; Lin, Z. C.; Liu, H. Spatial pyramid based graph reasoning for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8947–8956, 2020.
Chen, X. L.; Gupta, A. Spatial memory for context reasoning in object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 4086–4096, 2017.
Cui, Q. J.; Sun, H. J.; Yang, F. Learning dynamic relationships for 3D human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6518–6526, 2020.
Huang, Y. F.; Sugano, Y.; Sato, Y. Improving action segmentation via graph-based temporal reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14021–14031, 2020.
Krishna, R.; Zhu, Y. K.; Groth, O.; Johnson, J.; Hata, K. J.; Kravitz, J.; Chen, S.; Kalantidis, Y.; Li, L.-J.; Shamma, D. A.; et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision Vol. 123, No. 1, 32–73, 2017.
Article MathSciNet Google Scholar
Liu, C. C.; Jin, Y.; Xu, K. H.; Gong, G. Q.; Mu, Y. D. Beyond short-term snippet: Video relation detection with spatio-temporal global context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10837–10846, 2020.
Cadene, R.; Ben-Younes, H.; Cord, M.; Thome, N. MUREL: Multimodal relational reasoning for visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1989–1998, 2019.
Sung, F.; Yang, Y. X.; Zhang, L.; Xiang, T.; Torr, P. H. S.; Hospedales, T. M. Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1199–1208, 2018.
Wang, W. B.; Wang, R. P.; Shan, S. G.; Chen, X. L. Exploring context and visual pattern of relationship for scene graph generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8180–8189, 2019.
Huang, S. S.; Fu, H. B.; Hu, S. M. Structure guided interior scene synthesis via graph matching. Graphical Models Vol. 85, 46–55, 2016.
Article MathSciNet Google Scholar
Song, P.; Zheng, Y.; Jia, J. Web3d learning platform of furniture layout based on case-based reasoning and distance field. In: E-Learning and Games. Lecture Notes in Computer Science, Vol. 10345. Tian, F.; Gatzidis, C.; El Rhalibi, A.; Tang, W.; Charles, F. Eds. Springer Cham, 235–250, 2017.
Chapter Google Scholar
Duan, Y. Q.; Zheng, Y.; Lu, J. W.; Zhou, J.; Tian, Q. Structural relational reasoning of point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 949–958, 2019.
Kulkarni, N.; Misra, I.; Tulsiani, S.; Gupta, A. 3D-RelNet: Joint object and relational network for 3D prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2212–2221, 2019.
Li, Y.; Ma, L. F.; Tan, W. K.; Sun, C.; Cao, D. P.; Li, J. GRNet: Geometric relation network for 3D object detection from point clouds. ISPRS Journal of Photogrammetry and Remote Sensing Vol. 165, 43–53, 2020.
Article Google Scholar
Wang, L.; Huang, Y. C.; Hou, Y. L.; Zhang, S. M.; Shan, J. Graph attention convolution for point cloud semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10288–10297, 2019.
Chen, C.; Fragonara, L. Z.; Tsourdos, A. GAPNet: Graph attention based point neural network for exploiting local feature of point cloud. arXiv preprint arXiv:1905.08705, 2019.
Wen, C. C.; Li, X.; Yao, X. J.; Peng, L.; Chi, T. H. Airborne LiDAR point cloud classification with global-local graph attention convolution neural network. ISPRS Journal of Photogrammetry and Remote Sensing Vol. 173, 181–194, 2021.
Article Google Scholar
Wen, X.; Li, T. Y.; Han, Z. Z.; Liu, Y. S. Point cloud completion by skip-attention network with hierarchical folding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1936–1945, 2020.
Wang, Y.; Solomon, J. Deep closest point: Learning representations for point cloud registration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 3522–3531, 2019.
Yew, Z. J.; Lee, G. H. 3DFeat-Net: Weakly supervised local 3D features for point cloud registration. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11219. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 607–623, 2018.
Google Scholar
Zhang, W. X.; Xiao, C. X. PCAN: 3D attention map learning using contextual information for point cloud based retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12428–12437, 2019.
Sun, Q.; Liu, H. Y.; He, J.; Fan, Z. X.; Du, X. Y. DAGC: Employing dual attention and graph convolution for point cloud based place recognition. In: Proceedings of the International Conference on Multimedia Retrieval, 224–232, 2020.
Guo, M. H.; Cai, J. X.; Liu, Z. N.; Mu, T. J.; Martin, R. R.; Hu, S. M. PCT: Point cloud transformer. Computational Visual Media Vol. 7, No. 2, 187–199, 2021.
Article Google Scholar
Zhao, H.; Jiang, L.; Jia, J.; Torr, P.; Koltun, V. Point transformer. arXiv preprint arXiv:2012.09164, 2020.
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, 8026–8037, 2019.

Download references

Acknowledgements

We thank Jiazhao Zhang for server management. This paper is supported in part by National Nature Science Foundation of China (62132021, 62102435, 62002375, 62002376), National Key R&D Program of China (2018AAA0102200), and NUDT Research Grants (ZK19-30).

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, 410073, China
Yuqing Lan, Yao Duan, Chenyi Liu, Chenyang Zhu, Yueshan Xiong & Kai Xu
Shenzhen University, Shenzhen, 518061, China
Hui Huang

Authors

Yuqing Lan
View author publications
You can also search for this author in PubMed Google Scholar
Yao Duan
View author publications
You can also search for this author in PubMed Google Scholar
Chenyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chenyang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yueshan Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Hui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chenyang Zhu or Kai Xu.

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Yuqing Lan received his B.S. degree in network engineering from National University of Defense Technology, China, in 2019. He is now a postgraduate at the School of Computer, National University of Defense Technology, China. His research interests cover 3D object detection and 3D reconstruction.

Yao Duan received her master degree of computer science from National University of Defense Technology. She is now a Ph.D. student at the School of Computer, National University of Defense Technology, China. Her research interests include 3D object detection

Chenyi Liu received her B.S. degree in software engineering from Tianjin Normal University, China, in 2020. She is now a master student at the National University of Defense Technology, China. Her research interests cover 3D point cloud registration.

Chenyang Zhu is an assistant professor at the School of Computer, National University of Defense Technology. The current directions of interest include data-driven shape analysis and modeling, 3D vision and robot perception & navigation, etc.

Yueshan Xiong is a professor at the School of Computer, National University of Defense Technology. The current directions of interest include virtual surgery system, image and graphics processing, and intelligent computing.

Hui Huang is a Distinguished TFA Professor at Shenzhen University, where she directs the Visual Computing Research Center. Her research interests span computer graphics, 3D vision, and visualization. She is currently a senior member of IEEE/ACM/CSIG and a distinguished member of CCF.

Kai Xu is a professor at the School of Computer, National University of Defense Technology, where he received his Ph.D. degree in 2011. He serves on the editorial board of ACM Transactions on Graphics, Computer Graphics Forum, Computers & Graphics, and The Visual Computer. His research work can be found in his personal website: https://www.kevinkaixu.net.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Lan, Y., Duan, Y., Liu, C. et al. ARM3D: Attention-based relation module for indoor 3D object detection. Comp. Visual Media 8, 395–414 (2022). https://doi.org/10.1007/s41095-021-0252-6

Download citation

Received: 26 July 2021
Accepted: 25 August 2021
Published: 08 March 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s41095-021-0252-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

ARM3D: Attention-based relation module for indoor 3D object detection

Abstract

Article PDF

Similar content being viewed by others

Improving 3D Object Detection with Context-Aware and Dimensional Interaction Attention

Cross-scale Dynamic Relation Network for Object Detection

Instance-level Object relation module for one-stage Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ARM3D: Attention-based relation module for indoor 3D object detection

Abstract

Article PDF

Similar content being viewed by others

Improving 3D Object Detection with Context-Aware and Dimensional Interaction Attention

Cross-scale Dynamic Relation Network for Object Detection

Instance-level Object relation module for one-stage Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation