3D target detection using dual domain attention and SIFT operator in indoor scenes

Zhao, Hanshuo; Yang, Dedong; Yu, Jiankang

doi:10.1007/s00371-021-02217-z

3D target detection using dual domain attention and SIFT operator in indoor scenes

Original article
Published: 28 June 2021

Volume 38, pages 3765–3774, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

313 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

In a large number of real-life scenes and practical applications, 3D object detection is playing an increasingly important role. We need to estimate the position and direction of the 3D object in the real scene to complete the 3D object detection task. In this paper, we propose a new network architecture based on VoteNet to detect 3D point cloud targets. On the one hand, we use channel and spatial dual-domain attention module to enhance the features of the object to be detected while suppressing other useless features. On the other hand, the SIFT operator has scale invariance and the ability to resist occlusion and background interference. The PointSIFT module we use can capture information in different directions of point cloud in space, and is robust to shapes of different proportions, so as to better detect objects that are partially occluded. Our method is evaluated on the SUN-RGBD and ScanNet datasets of indoor scenes. The experimental results show that our method has better performance than VoteNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D Small Object Detection from Cameras and Point Clouds Using Five-Head Attention in a Fusion Method

SPOT: Selective Point Cloud Voting for Better Proposal in Point Cloud Object Detection

A 3D Point Cloud Object Detection Algorithm Based on MSCS-Pointpillars

References

Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Scannet, A.D.: Richly-annotated 3d reconstructions of indoor scenes. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Dyer, C., Kuncoro, A., Ballesteros, M., Smith, N.A.: Recurrent neural network grammars. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Hou, J., Dai, A., Niebner, M.: 3d-sis: 3d semantic instance segmentation of rgb-d scans. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Jie, H., Shen, L., Albanie, S., Sun, G., Enhua, W.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 2011–2023 (2020)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. (NIPS) 28, 2017–2025 (2015)
Google Scholar
Jiang, M., Wu, Y., Zhao, T., Zhao, Z., Pointsift, C.L.: A sift-like network module for 3d point cloud semantic segmentation (2018)
Lahoud, J., Ghanem, B.: 2d-driven 3d object detection in rgb-d images. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis. 77(1–3), 259–289 (2008)
Article Google Scholar
Pauline, C., Steven, N., Sift, H.: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31(13), 3812–3814 (2003)
Article Google Scholar
Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3d object detection in point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Qi, C.R., Su, H., Mo, K., Pointnet, L.J.G.: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30, 5099–5108 (2017)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Ren, Z., Sudderth, E.B.: Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Roy, A.G., Navab, N., Wachinger, C.: Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 421–429. Springer (2018)
Shi, S., Wang, X., Pointrcnn, H.L.: 3d object proposal generation and detection from point cloud. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Simon, M., Milz, S., Amende, K., Gross, H.-M.: Complex-yolo: real-time 3d object detection on point clouds. arXiv:1803.06199 (2018)
Song, S., Lichtenberg, S.P., Xiao, J.: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Song, S., Xiao., J.: Deep sliding shapes for a modal 3D object detection in RGB-D images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Woo, S., Park, J., Lee, J.-Y., Cbam, I.S.K.: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Wu, W., Qi, Z., Pointconv, L.F.: Deep convolutional networks on 3d point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Yang, Z., Sun, Y., Liu, S., Shen, X., Std, J.J.: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Yi, L., Zhao, W., Wang, H., Sung, M., Guibas, L.J.: Generative shape proposal network for 3d instance segmentation in point cloud. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Zhang, H., Cao, J., Lu, G., Ouyang, W., Danet, Z.S.: Decompose-and-aggregate network for 3D human shape and pose estimation. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)
Zhou, Y., Voxelnet, O.T.: End-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence, Hebei University of Technology, Tianjin, 300401, China
Hanshuo Zhao, Dedong Yang & Jiankang Yu

Authors

Hanshuo Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Dedong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jiankang Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dedong Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, H., Yang, D. & Yu, J. 3D target detection using dual domain attention and SIFT operator in indoor scenes. Vis Comput 38, 3765–3774 (2022). https://doi.org/10.1007/s00371-021-02217-z

Download citation

Accepted: 12 June 2021
Published: 28 June 2021
Issue Date: November 2022
DOI: https://doi.org/10.1007/s00371-021-02217-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D target detection using dual domain attention and SIFT operator in indoor scenes

Abstract

Access this article

Similar content being viewed by others

3D Small Object Detection from Cameras and Point Clouds Using Five-Head Attention in a Fusion Method

SPOT: Selective Point Cloud Voting for Better Proposal in Point Cloud Object Detection

A 3D Point Cloud Object Detection Algorithm Based on MSCS-Pointpillars

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3D target detection using dual domain attention and SIFT operator in indoor scenes

Abstract

Access this article

Similar content being viewed by others

3D Small Object Detection from Cameras and Point Clouds Using Five-Head Attention in a Fusion Method

SPOT: Selective Point Cloud Voting for Better Proposal in Point Cloud Object Detection

A 3D Point Cloud Object Detection Algorithm Based on MSCS-Pointpillars

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation