Skip to main content
Log in

VOID: 3D object recognition based on voxelization in invariant distance space

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Recognizing 3D objects based on local feature descriptors, in point cloud scenes with occlusion and clutter, is a very challenging task. Most existing 3D local feature descriptors rely on normal information to encode local features, however, they ignore the normal-sign-ambiguity issue, which greatly limits their descriptiveness and robustness. This paper proposes a method called VOxelization in Invariant Distance space for 3D object recognition. First, we propose a VOID descriptor that is invariant to normal-sign-ambiguity, and is also rotation-invariant, distinctive, robust, and efficient. Second, a VOID-based 3D object recognition method considering the self-similarity between local features is proposed to enhance the recognition performance. Five standard datasets are employed to validate our proposed method as well as comparison with the state-of-the-arts. The results suggest that: (1) VOID descriptor is invariant to normal-sign-ambiguity, distinctive, and robust; (2) VOID-based 3D object recognition achieves outstanding recognition performance, i.e., 99.47%, 93.07% and 99.18%, on the U3OR, Queen’s and Ca’ Foscari Venezia datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Liu, Z., et al.: A feature-preserving framework for point cloud denoising. Comput. Aided Des. 127, 102857 (2020). https://doi.org/10.1016/j.cad.2020.102857

    Article  MathSciNet  Google Scholar 

  2. Que, Z., Lu, G., Xu, D.: VoxelContext-net: an octree based framework for point cloud compression. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6038–6047. IEEE (2021) https://doi.org/10.1109/CVPR46437.2021.00598

  3. Fu, C., Li, G., Song, R., Gao, W., Liu, S.: OctAttention: octree-based large-scale contexts model for point cloud compression. arXiv preprint arXiv:2202.06028, (2022) https://doi.org/10.48550/arXiv.2202.06028

  4. Bayramoglu, N., Alatan, A.A.: Shape index SIFT: range image recognition using local features. In: Proc. International Conference on Pattern Recognition, pp. 352–355. IEEE (2010) https://doi.org/10.1109/ICPR.2010.95

  5. Funkhouser, T., et al.: A search engine for 3D models. ACM Trans. Graph. 22(1), 83–105 (2003). https://doi.org/10.1145/588272.588279

    Article  Google Scholar 

  6. Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. 21(4), 807–832 (2002). https://doi.org/10.1145/571647.571648

    Article  MathSciNet  MATH  Google Scholar 

  7. Paquet, E., Rioux, M., Murching, A., Naveen, T., Tabatabai, A.: Description of shape information for 2-D and 3-D objects. Signal Process. Image Commun. 16(1–2), 103–122 (2000). https://doi.org/10.1016/S0923-5965(00)00020-5

    Article  Google Scholar 

  8. Rusu, R.B., Bradski, G., Thibaux, R., Hsu, J.: Fast 3d recognition and pose using the viewpoint feature histogram. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2155–2162. IEEE (2010) https://doi.org/10.1109/IROS.2010.5651280

  9. Petrelli, A., Di Stefano, L.: On the repeatability of the local reference frame for partial shape matching. In: Proc. IEEE/CVF International Conference on Computer Vision, pp. 2244–2251. IEEE (2011) https://doi.org/10.1109/ICCV.2011.6126503

  10. Guo, Y., Sohel, F., Bennamoun, M., Lu, M., Wan, J.: Rotational projection statistics for 3D local surface description and object recognition. Int. J. Comput. Vis. 105(1), 63–86 (2013). https://doi.org/10.1007/s11263-013-0627-y

    Article  MathSciNet  MATH  Google Scholar 

  11. Tombari, F., Salti, S., Di Stefano, L., Unique signatures of histograms for local surface description. In: Proc. European Conference on Computer Vision, pp. 356–369. Springer (2010) https://doi.org/10.1007/978-3-642-15558-1_26

  12. Taati, B., Greenspan, M.: Local shape descriptor selection for object recognition in range data. Comput. Vis. Image Underst. 115(5), 681–694 (2011). https://doi.org/10.1109/IEMBS.2011.6090506

    Article  Google Scholar 

  13. Bariya, P., Nishino, K.: Scale-hierarchical 3d object recognition in cluttered scenes. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1657–1664. IEEE (2010) https://doi.org/10.1109/CVPR.2010.5539774

  14. Malassiotis, S., Strintzis, M.G.: Snapshots: a novel local surface descriptor and matching algorithm for robust 3D surface alignment. IEEE Trans. Pattern Anal. Mach. Intell. 29(7), 1285–1290 (2007). https://doi.org/10.1109/TPAMI.2007.1060

    Article  Google Scholar 

  15. Johnson, A.E., Hebert, M.: Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21(5), 433–449 (1999). https://doi.org/10.1109/34.765655

    Article  Google Scholar 

  16. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: Proc. IEEE International Conference on Robotics and Automation, pp. 3212–3217. IEEE (2009) https://doi.org/10.1109/ROBOT.2009.5152473

  17. Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J.: 3D object recognition in cluttered scenes with local surface features: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2270–2287 (2014). https://doi.org/10.1109/TPAMI.2014.2316828

    Article  Google Scholar 

  18. Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3dmatch: learning local geometric descriptors from rgb-d reconstructions. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1802–1811. (2017) https://doi.org/10.1109/CVPR.2017.29

  19. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 652–660. (2017) https://doi.org/10.1109/CVPR.2017.16

  20. Yang, J., Zhao, C., Xian, K., Zhu, A., Cao, Z.: Learning to fuse local geometric features for 3D rigid data matching. Inf. Fusion 61, 24–35 (2020). https://doi.org/10.1016/j.inffus.2020.03.008

    Article  Google Scholar 

  21. Berkmann, J., Caelli, T.: Computation of surface geometry and segmentation using covariance techniques. IEEE Trans. Pattern Anal. Mach. Intell. 16(11), 1114–1116 (1994). https://doi.org/10.1109/34.334391

    Article  Google Scholar 

  22. Novatnack, J., Nishino, K.: Scale-dependent/invariant local 3D shape descriptors for fully automatic registration of multiple sets of range images. In: Proc. European Conference on Computer Vision, pp. 440–453. Springer (2008) https://doi.org/10.1007/978-3-540-88690-7_33

  23. Yang, J., Zhang, Q., Xian, K., Xiao, Y., Cao, Z.: Rotational contour signatures for both real-valued and binary feature representations of 3D local shape. Comput. Vis. Image Underst. 160, 133–147 (2017). https://doi.org/10.1016/j.cviu.2017.02.004

    Article  Google Scholar 

  24. Yang, J., Zhang, Q., Xiao, Y., Cao, Z.: TOLDI: an effective and robust approach for 3D local shape description. Pattern Recogn. 65, 175–187 (2017). https://doi.org/10.1016/j.patcog.2016.11.019

    Article  Google Scholar 

  25. Tao, W., Hua, X., Yu, K., Chen, X., Zhao, B.: A pipeline for 3-D object recognition based on local shape description in cluttered scenes. Proc. IEEE Trans. Geosci. Remote Sens. 59(1), 801–816 (2020). https://doi.org/10.1109/TGRS.2020.2998683

    Article  Google Scholar 

  26. Zhou, W., Ma, C., Yao, T., Chang, P., Zhang, Q., Kuijper, A.: Histograms of Gaussian normal distribution for 3D feature matching in cluttered scenes. Vis. Comput. 35(4), 489–505 (2019). https://doi.org/10.1007/s00371-018-1478-x

    Article  Google Scholar 

  27. Yang, J., Xiao, Y., Cao, Z.: Toward the repeatability and robustness of the local reference frame for 3D shape matching: an evaluation. IEEE Trans. Image Process. 27(8), 3766–3781 (2018). https://doi.org/10.1109/TIP.2018.2827330

    Article  MathSciNet  MATH  Google Scholar 

  28. Rusu, R.B., Blodow, N., Marton, Z.C., Beetz, M.: Aligning point cloud views using persistent feature histograms. In: Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3384–3391. IEEE (2008) https://doi.org/10.1109/IROS.2008.4650967

  29. Yang, J., Cao, Z., Zhang, Q.: A fast and robust local descriptor for 3D point cloud registration. Inf. Sci. 346, 163–179 (2016). https://doi.org/10.1016/j.ins.2016.01.095

    Article  Google Scholar 

  30. Flint, A., Dick, A., Van den Hengel, A.: Local 3D structure recognition in range images. IET Comput. Vis. 2(4), 208–217 (2008). https://doi.org/10.1049/iet-cvi:20080037

    Article  Google Scholar 

  31. Taati, B., Bondy, M., Jasiobedzki, P., Greenspan, M.: Variable dimensional local shape descriptors for object recognition in range data. In: Proc. IEEE/CVF International Conference on Computer Vision, pp. 1–8. IEEE (2007) https://doi.org/10.1109/ICCV.2007.4408830

  32. Zhao, H., Tang, M., Ding, H.: HoPPF: a novel local surface descriptor for 3D object recognition. Pattern Recogn. 103, 107272 (2020). https://doi.org/10.1016/j.patcog.2020.107272

    Article  Google Scholar 

  33. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413, (2017)

  34. Deng, H., Birdal, T., Ilic, S.: Ppf-foldnet: unsupervised learning of rotation invariant 3d local descriptors. In: Proc. European Conference on Computer Vision, pp. 602–618. (2018) https://doi.org/10.1007/978-3-030-01228-1_37

  35. Yang, Y., Feng, C., Shen, Y., Tian, D.: Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 206–215. (2018) https://doi.org/10.1109/cvpr.2018.00029

  36. Deng, H., Birdal, T., Ilic, S.: Ppfnet: global context aware local features for robust 3d point matching. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 195–205. (2018) https://doi.org/10.1109/CVPR.2018.00028

  37. Ao, S., Hu, Q., Yang, B., Markham, A., Guo, Y.: SpinNet: learning a general surface descriptor for 3D point cloud registration. In: Proc. Computer Vision and Pattern Recognition, pp. 11753–11762. (2021)

  38. Bu, S., Han, P., Liu, Z., Li, K., Han, J.: Shift-invariant ring feature for 3D shape. Vis. Comput. 30(6), 867–876 (2014). https://doi.org/10.1007/s00371-014-0970-1

    Article  Google Scholar 

  39. Li, L., Fu, H., Ovsjanikov, M.: UPDesc: unsupervised point descriptor learning for robust registration. arXiv preprint arXiv:2108.02740 (2021)

  40. Zan, G., Zhou, C., Wegner, J.D., Wieser, A.: The perfect match: 3D point cloud matching with smoothed densities. In: Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2019) https://doi.org/10.1109/CVPR.2019.00569

  41. Huang, S., Xie, Y., Zhu, S.-C., Zhu, Y.: Spatio-temporal self-supervised representation learning for 3D point clouds. In: Proc. IEEE/CVF International Conference on Computer Vision, pp. 6535–6545. (2021)

  42. Liu, H., Cong, Y., Yang, C., Tang, Y.: Efficient 3D object recognition via geometric information preservation. Pattern Recogn. 92, 135–145 (2019). https://doi.org/10.1016/j.patcog.2019.03.025

    Article  Google Scholar 

  43. Bariya, P., Novatnack, J., Schwartz, G., Nishino, K.: 3D geometric scale variability in range images: features and descriptors. Int. J. Comput. Vis. 99(2), 232–255 (2012). https://doi.org/10.1007/s11263-012-0526-7

    Article  MathSciNet  Google Scholar 

  44. Lim, J., Lee, K.: 3D object recognition using scale-invariant features. Vis. Comput. 35(1), 71–84 (2019). https://doi.org/10.1007/s00371-017-1453-y

    Article  Google Scholar 

  45. Frome, A., Huber, D., Kolluri, R., Bülow, T., Malik, J.: Recognizing objects in range data using regional point descriptors. In: Proc. European Conference on Computer Vision. Springer, pp. 224–237. (2004) https://doi.org/10.1007/978-3-540-24672-5_18

  46. Shang, L., Greenspan, M.: Real-time object recognition in sparse range images using error surface embedding. Int. J. Comput. Vis. 89(2–3), 211–228 (2010). https://doi.org/10.1007/s11263-009-0276-3

    Article  Google Scholar 

  47. Klasing, K., Althoff, D., Wollherr, D., Buss, M.: Comparison of surface normal estimation methods for range sensing applications. In: Proc. IEEE International Conference on Robotics and Automation, pp. 3206–3211. IEEE (2009) https://doi.org/10.1109/ROBOT.2009.5152493

  48. Mian, A.S., Bennamoun, M., Owens, R.A.: A novel representation and feature matching algorithm for automatic pairwise registration of range images. Int. J. Comput. Vis. 66(1), 19–40 (2006). https://doi.org/10.1007/s11263-005-3221-0

    Article  Google Scholar 

  49. Yang, J., Xiao, Y., Cao, Z.: Aligning 2.5 D scene fragments with distinctive local geometric features and voting-based correspondences. IEEE Trans. Circuits Syst. Video Technol. 29(3), 714–729 (2018). https://doi.org/10.1109/TCSVT.2018.2813083

    Article  Google Scholar 

  50. Horn, A.: Doubly stochastic matrices and the diagonal of a rotation matrix. Am. J. Math. 76(3), 620–630 (1954). https://doi.org/10.2307/2372705

    Article  MathSciNet  MATH  Google Scholar 

  51. Tombari, F., Salti, S., Di Stefano, L.: Performance evaluation of 3D keypoint detectors. Int. J. Comput. Vis. 102(1), 198–220 (2013). https://doi.org/10.1007/s11263-012-0545-4

    Article  Google Scholar 

  52. Mian, A.S., Bennamoun, M., Owens, R.: Three-dimensional model-based object recognition and segmentation in cluttered scenes. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1584–1601 (2006). https://doi.org/10.1109/TPAMI.2006.213

    Article  Google Scholar 

  53. Mian, A., Bennamoun, M., Owens, R.: On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. Int. J. Comput. Vis. 89(2), 348–361 (2010). https://doi.org/10.1007/s11263-009-0296-z

    Article  Google Scholar 

  54. Rusu, R.B., Cousins, S.: 3d is here: point cloud library (pcl). In: Proc. IEEE International Conference on Robotics and Automation, pp. 1–4. IEEE (2011) https://doi.org/10.1109/ICRA.2011.5980567

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (No. 62002295 and 62006025), the Ningbo Natural Science Foundation (No. 202003N4058), the China Postdoctoral Science Foundation (No. 2020M673319), the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2021JQ-290 and 2020JQ-210), the State Key Laboratory of Rail Transit Engineering Informatization (FSDI) [Contract No. SKLKZ21-02], and the Fundamental Research Funds for the Central Universities (No. 3102019QD1002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Wang.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, J., Fan, S., Huang, Z. et al. VOID: 3D object recognition based on voxelization in invariant distance space. Vis Comput 39, 3073–3089 (2023). https://doi.org/10.1007/s00371-022-02514-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02514-1

Keywords

Navigation