Skip to main content
Log in

3D target detection using dual domain attention and SIFT operator in indoor scenes

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In a large number of real-life scenes and practical applications, 3D object detection is playing an increasingly important role. We need to estimate the position and direction of the 3D object in the real scene to complete the 3D object detection task. In this paper, we propose a new network architecture based on VoteNet to detect 3D point cloud targets. On the one hand, we use channel and spatial dual-domain attention module to enhance the features of the object to be detected while suppressing other useless features. On the other hand, the SIFT operator has scale invariance and the ability to resist occlusion and background interference. The PointSIFT module we use can capture information in different directions of point cloud in space, and is robust to shapes of different proportions, so as to better detect objects that are partially occluded. Our method is evaluated on the SUN-RGBD and ScanNet datasets of indoor scenes. The experimental results show that our method has better performance than VoteNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  2. Scannet, A.D.: Richly-annotated 3d reconstructions of indoor scenes. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  3. Dyer, C., Kuncoro, A., Ballesteros, M., Smith, N.A.: Recurrent neural network grammars. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2016)

  4. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

  5. Hou, J., Dai, A., Niebner, M.: 3d-sis: 3d semantic instance segmentation of rgb-d scans. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  6. Jie, H., Shen, L., Albanie, S., Sun, G., Enhua, W.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 2011–2023 (2020)

    Google Scholar 

  7. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. (NIPS) 28, 2017–2025 (2015)

    Google Scholar 

  8. Jiang, M., Wu, Y., Zhao, T., Zhao, Z., Pointsift, C.L.: A sift-like network module for 3d point cloud semantic segmentation (2018)

  9. Lahoud, J., Ghanem, B.: 2d-driven 3d object detection in rgb-d images. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)

  10. Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis. 77(1–3), 259–289 (2008)

    Article  Google Scholar 

  11. Pauline, C., Steven, N., Sift, H.: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31(13), 3812–3814 (2003)

    Article  Google Scholar 

  12. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3d object detection in point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

  13. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  14. Qi, C.R., Su, H., Mo, K., Pointnet, L.J.G.: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  15. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30, 5099–5108 (2017)

    Google Scholar 

  16. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  17. Ren, Z., Sudderth, E.B.: Three-dimensional object detection and layout prediction using clouds of oriented gradients. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  18. Roy, A.G., Navab, N., Wachinger, C.: Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 421–429. Springer (2018)

  19. Shi, S., Wang, X., Pointrcnn, H.L.: 3d object proposal generation and detection from point cloud. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  20. Simon, M., Milz, S., Amende, K., Gross, H.-M.: Complex-yolo: real-time 3d object detection on point clouds. arXiv:1803.06199 (2018)

  21. Song, S., Lichtenberg, S.P., Xiao, J.: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  22. Song, S., Xiao., J.: Deep sliding shapes for a modal 3D object detection in RGB-D images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  23. Woo, S., Park, J., Lee, J.-Y., Cbam, I.S.K.: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

  24. Wu, W., Qi, Z., Pointconv, L.F.: Deep convolutional networks on 3d point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  25. Yang, Z., Sun, Y., Liu, S., Shen, X., Std, J.J.: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

  26. Yi, L., Zhao, W., Wang, H., Sung, M., Guibas, L.J.: Generative shape proposal network for 3d instance segmentation in point cloud. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  27. Zhang, H., Cao, J., Lu, G., Ouyang, W., Danet, Z.S.: Decompose-and-aggregate network for 3D human shape and pose estimation. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)

  28. Zhou, Y., Voxelnet, O.T.: End-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dedong Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, H., Yang, D. & Yu, J. 3D target detection using dual domain attention and SIFT operator in indoor scenes. Vis Comput 38, 3765–3774 (2022). https://doi.org/10.1007/s00371-021-02217-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02217-z

Keywords

Navigation