Skip to main content
Log in

Shape completion using orthogonal views through a multi-input–output network

  • Industrial and Commercial Application
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Knowing the shape of objects is essential to many robotics tasks. However, this is not always feasible. Recent approaches based on point clouds and voxel cubes have been proposed for shape completion from a single-depth view. However, they tend to be computationally expensive and require the tuning of many weights. This paper presents a novel architecture for shape completion based on six orthogonal views obtained from a point cloud (they can be seen as the six faces of a dice). Our network uses one branch for each orthogonal view as input–output and mixes them in the middle of the architecture. By using orthogonal views, the number of required parameters is significantly reduced. We also introduce a novel method to filter the output of networks based on orthogonal views and describe algorithms to convert an orthogonal view to voxel cube and point cloud. We compared our approach against state-of-the-art approaches on the YCB and ShapeNet datasets using the Chamfer distance and mean square error measures and showed very competitive performance with less than 5% of their parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this published at https://github.com/hideldt/SCBOV.

Notes

  1. PointNet/PointNet++ [8, 9] were proposed to work directly over the point clouds. Several works have been proposed using their frameworks to solve problems like semantic segmentation, classification, point completion and others [3,4,5,6,7, 17]; in this work, we compared our results against [3] which is based on PointNet.

  2. Note that parts of the shape can be touched on the face of the voxel cube. Our representation takes the distance to the first occupied voxel (as shown in Algorithm 2), and if it coincides with position 0 (this means that it is touching a face of the voxel cube), all these voxels have the same value of the background; for this reason, we add the offset. The offset was set empirically at four.

  3. For ShapeNet and YCB datasets, we use the versions proposed by [3] and [1], respectively.

  4. They use as base network PCN which has 6.8 M parameters, so compared with them we only use 4.4% of their weights.

  5. Note that on a lower capacity GPU the comparison could not be made since both networks require at least 8GB of VRAM to be trained.

  6. This means that the inner points are not taken into account. Only the border points.

References

  1. Varley J, DeChant C, Richardson A, Ruales J, Allen P (2017) Shape completion enabled robotic grasping. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 2442–2447. https://doi.org/10.1109/iros.2017.8206060

  2. Yang B, Rosa S, Markham A, Trigoni N, Wen H (2019) Dense 3D object reconstruction from a single depth view. IEEE Trans Pattern Anal Mach Intell 41(12):2820–2834. https://doi.org/10.1109/TPAMI.2018.2868195

    Article  Google Scholar 

  3. Yuan W, Khot T, Held D, Mertz C, Hebert M (2018) Pcn: point completion network. In: 2018 international conference on 3D vision (3DV), pp 728–737

  4. Liu M, Sheng L, Yang S, Shao J, Hu S-M (2019) Morphing and sampling network for dense point cloud completion. In: The thirty-fourth AAAI conference on artificial intelligence

  5. Peng Y, Chang M, Wang Q, Qian Y, Zhang Y, Wei M, Liao X (2020) Sparse-to-dense multi-encoder shape completion of unstructured point cloud. IEEE Access 8:30969–30978

    Article  Google Scholar 

  6. Yu X, Rao Y, Wang Z, Liu Z, Lu J, Zhou J (2021) Pointr: diverse point cloud completion with geometry-aware transformers. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 12478–12487. https://doi.org/10.1109/ICCV48922.2021.01227

  7. Xiang P, Wen X, Liu Y-S, Cao Y-P, Wan P, Zheng W, Han Z (2021) SnowflakeNet: point cloud completion by snowflake point deconvolution with skip-transformer. In: Proceedings of the IEEE international conference on computer vision (ICCV)

  8. Charles RQ, Su H, Kaichun M, Guibas LJ (2017) Pointnet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 77–85. https://doi.org/10.1109/CVPR.2017.16

  9. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates Inc, USA

    Google Scholar 

  10. Hu T, Han Z, Shrivastava A, Zwicker M (2019) Render4completion: synthesizing multi-view depth maps for 3D shape completion. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp 4114–4122. https://doi.org/10.1109/ICCVW.2019.00506

  11. Hu T, Han Z, Zwicker M (2020) 3D shape completion with multi-view consistent inference. In: The Thirty-Fourth AAAI conference on artificial intelligence, AAAI 2020, The Thirty-Second innovative applications of artificial intelligence conference, IAAI 2020, The tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 Feb 2020, pp 10997–11004

  12. Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F (2015) ShapeNet: an information-rich 3D model repository. Technical Report arXiv:1512.03012 [cs.GR], Toyota Technological Institute, Chicago

  13. Calli B, Singh A, Walsman A, Srinivasa S, Abbeel P, Dollar AM (2015) The ycb object and model set: towards common benchmarks for manipulation research. In: 2015 international conference on advanced robotics (ICAR), pp 510–517. https://doi.org/10.1109/ICAR.2015.7251504

  14. Kappler D, Bohg J, Schaal S (2015) Leveraging big data for grasp planning. In: 2015 IEEE international conference on robotics and automation (ICRA), pp 4304–4311. https://doi.org/10.1109/ICRA.2015.7139793

  15. Koenig N, Howard A (2004) Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS) (IEEE Cat. No.04CH37566), vol 3, pp 2149–21543 . https://doi.org/10.1109/IROS.2004.1389727

  16. Min P (2019) Binvox. http://www.patrickmin.com/binvox or https://www.google.com/search?q=binvox. Accessed on 05 Oct 2019

  17. Saha M, Amin SB, Sharma A, Kumar TKS, Kalia RK (2022) AI-driven quantification of ground glass opacities in lungs of COVID-19 patients using 3D computed tomography imaging. PLoS ONE 17:1–14. https://doi.org/10.1371/journal.pone.0263916

    Article  Google Scholar 

  18. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI conference on artificial intelligence. AAAI’17, pp 4278–4284

  19. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical image computing and computer-assisted intervention—MICCAI 2015. Springer, Cham, pp 234–241

    Google Scholar 

  20. Riegler G, Ulusoy AO, Geiger A (2017) Octnet: learning deep 3D representations at high resolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6620–6629. https://doi.org/10.1109/CVPR.2017.701

  21. Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3D–r2n2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer, Cham, pp 628–644

    Chapter  Google Scholar 

  22. Rusu RB, Cousins S (2011) 3D is here: point cloud library (PCL). In: IEEE international conference on robotics and automation (ICRA). Shanghai, China, pp 1–4

  23. Han X, Li Z, Huang H, Kalogerakis E, Yu Y (2017) High-resolution shape completion using deep neural networks for global structure and local geometry inference. In: 2017 IEEE international conference on computer vision (ICCV), pp 85–93. https://doi.org/10.1109/ICCV.2017.19

  24. Oliphant T (2006) A guide to NumPy. Trelgol Publishing, USA

    Google Scholar 

  25. Kingma D, Ba J (2014) Adam: a method for stochastic optimization. In: International conference on learning representations. arxiv:1412.6980

  26. Chollet F (2021) Deep Learning with Python, Second Edition. ISBN 9781617296864

  27. Abadi M, Agarwal A et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from https://www.tensorflow.org/

  28. Chollet F (2017) Deep learning with Python, 1st edn. Manning Publications Co., Greenwich

    Google Scholar 

  29. Do T-T, Nguyen A, Reid I (2018) AffordanceNet: an end-to-end deep learning approach for object affordance detection. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 5882–5889. https://doi.org/10.1109/icra.2018.8460902

Download references

Acknowledgements

L. Delgado acknowledges CONACYT for the scholarship granted toward pursuing his PhD studies with Grant Number 707984.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Delgado.

Ethics declarations

Conflict of interest

The authors declare they have no financial interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Delgado, L., Morales, E.F. Shape completion using orthogonal views through a multi-input–output network. Pattern Anal Applic 26, 1045–1057 (2023). https://doi.org/10.1007/s10044-023-01154-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-023-01154-y

Keywords

Navigation