Learning 3D Part Assembly from a Single Image

Li, Yichen; Mo, Kaichun; Shao, Lin; Sung, Minhyuk; Guibas, Leonidas

doi:10.1007/978-3-030-58539-6_40

Yichen Li¹²,
Kaichun Mo¹²,
Lin Shao¹²,
Minhyuk Sung¹³ &
…
Leonidas Guibas¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12351))

Included in the following conference series:

European Conference on Computer Vision

4037 Accesses
21 Citations

Abstract

Autonomous assembly is a crucial capability for robots in many applications. For this task, several problems such as obstacle avoidance, motion planning, and actuator control have been extensively studied in robotics. However, when it comes to task specification, the space of possibilities remains underexplored. Towards this end, we introduce a novel problem, single-image-guided 3D part assembly, along with a learning-based solution. We study this problem in the setting of furniture assembly from a given complete set of parts and a single image depicting the entire assembled object. Multiple challenges exist in this setting, including handling ambiguity among parts (e.g., slats in a chair back and leg stretchers) and 3D pose prediction for parts and part subassemblies, whether visible or occluded. We address these issues by proposing a two-module pipeline that leverages strong 2D-3D correspondences and assembly-oriented graph message-passing to infer part relationships. In experiments with a PartNet-based synthetic benchmark, we demonstrate the effectiveness of our framework as compared with three baseline approaches (code and data available at https://github.com/AntheaLi/3DPartAssembly).

Y. Li and K. Mo—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35
Chapter Google Scholar
Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., et al.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3364–3372 (2016)
Google Scholar
Braun, M., Rao, Q., Wang, Y., Flohr, F.: Pose-RCNN: joint object detection and pose estimation using 3D object proposals. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 1546–1551. IEEE (2016)
Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository (2015)
Google Scholar
Chaudhuri, S., Kalogerakis, E., Guibas, L.J., Koltun, V.: Probabilistic reasoning for assembly-based 3D modeling. In: ACM SIGGRAPH (2011)
Google Scholar
Chaudhuri, S., Koltun, V.: Data-driven suggestions for creativity support in 3D modeling. In: ACM SIGGRAPH Asia (2010)
Google Scholar
Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11973–11982 (2020)
Google Scholar
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR (2019)
Google Scholar
Choi, C., Taguchi, Y., Tuzel, O., Liu, M., Ramalingam, S.: Voting-based pose estimation for robotic assembly using a 3D sensor. In: 2012 IEEE International Conference on Robotics and Automation, pp. 1724–1731, May 2012
Google Scholar
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Chapter Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Domokos, C., Kato, Z.: Realigning 2D and 3D object fragments without correspondences. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 195–202 (2015)
Article Google Scholar
Dubrovina, A., Xia, F., Achlioptas, P., Shalah, M., Groscot, R., Guibas, L.J.: Composite shape modeling via latent space factorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8140–8149 (2019)
Google Scholar
Fan, H., Su, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)
Google Scholar
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)
Google Scholar
Feng, C., Xiao, Y., Willette, A., McGee, W., Kamat, V.R.: Vision guided autonomous robotic assembly and as-built scanning on unstructured construction sites. Autom. Constr. 59, 128–138 (2015)
Article Google Scholar
Gao, L., et al.: SDM-NET: deep generative network for structured deformable mesh. In: ACM SIGGRAPH Asia (2019)
Google Scholar
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: AtlasNet: a papier-mâché approach to learning 3D surface generation. In: CVPR (2019)
Google Scholar
Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4731–4740 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2015)
Google Scholar
Hutchinson, S.A., Kak, A.C.: Extending the classical AI planning paradigm to robotic assembly planning. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 182–189. IEEE (1990)
Google Scholar
Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: NeurIPS (2018)
Google Scholar
Izadinia, H., Shan, Q., Seitz, S.M.: IM2CAD. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5134–5143 (2017)
Google Scholar
Jaiswal, P., Huang, J., Rai, R.: Assembly-based conceptual 3D modeling with unlabeled components using probabilistic factor graph. Comput.-Aided Des. 74, 45–54 (2016)
Article Google Scholar
Jiménez, P.: Survey on assembly sequencing: a combinatorial and geometrical perspective. J. Intell. Manuf. 24(2), 235–250 (2013)
Article Google Scholar
Kalogerakis, E., Chaudhuri, S., Koller, D., Koltun, V.: A probabilistic model of component-based shape synthesis. In: ACM SIGGRAPH (2012)
Google Scholar
Kaufman, S.G., Wilson, R.H., Jones, R.E., Calton, T.L., Ames, A.L.: The Archimedes 2 mechanical assembly planning system. In: Proceedings of IEEE International Conference on Robotics and Automation, vol. 4, pp. 3361–3368. IEEE (1996)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)
Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2, 83–97 (1955)
Article MathSciNet Google Scholar
Langley, C.S., D’Eleuterio, G.M.T.: Neural network-based pose estimation for fixtureless assembly. In: Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No.01EX515), pp. 248–253, July 2001
Google Scholar
Levitin, G., Rubinovitz, J., Shnits, B.: A genetic algorithm for robotic assembly line balancing. Eur. J. Oper. Res. 168(3), 811–825 (2006)
Article MathSciNet Google Scholar
Li, J., Niu, C., Xu, K.: Learning part generation and assembly for structure-aware shape synthesis. In: AAAI (2020)
Google Scholar
Li, J., Xu, K., Chaudhuri, S., Yumer, E., Zhang, H., Guibas, L.: GRASS: generative recursive autoencoders for shape structures. In: ACM SIGGRAPH (2019)
Google Scholar
Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3D object reconstruction. In: AAAI (2018)
Google Scholar
Litany, O., Bronstein, A.M., Bronstein, M.M.: Putting the pieces together: regularized multi-part shape matching. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7583, pp. 1–11. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33863-2_1
Chapter Google Scholar
Litany, O., Rodolà, E., Bronstein, A.M., Bronstein, M.M., Cremers, D.: Non-rigid puzzles. In: Computer Graphics Forum, vol. 35, pp. 135–143. Wiley Online Library (2016)
Google Scholar
Litvak, Y., Biess, A., Bar-Hillel, A.: Learning pose estimation for high-precision robotic assembly using simulated depth images. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3521–3527, May 2019
Google Scholar
Máttyus, G., Luo, W., Urtasun, R.: DeepRoadMapper: extracting road topology from aerial images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3458–3466 (2017)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR (2019)
Google Scholar
Mo, K., et al.: StructureNet: hierarchical graph networks for 3D shape generation. In: ACM SIGGRAPH Asia (2019)
Google Scholar
Mo, K., et al.: StructEdit: learning structural shape variations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: CVPR (2019)
Google Scholar
Nelson, B., Papanikolopoulos, N., Khosla, P.: Visual servoing for robotic assembly. In: Visual Servoing: Real-Time Control of Robot Manipulators Based on Visual Sensory Feedback, pp. 139–164. World Scientific (1993)
Google Scholar
Niu, C., Li, J., Xu, K.: Im2Struct: recovering 3D shape structure from a single RGB image. In: CVPR (2018)
Google Scholar
Papon, J., Schoeler, M.: Semantic pose using deep networks trained on synthetic RGB-D. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 774–782 (2015)
Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: Learning continuous signed distance functions for shape representation. In: CVPR (2019)
Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. Phil. Mag. 2, 559–572 (1901)
Article Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)
Google Scholar
Rad, M., Lepetit, V.: Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3836 (2017)
Google Scholar
Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: CVPR (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: ICCV (2019)
Google Scholar
Schor, N., Katzir, O., Zhang, H., Cohen-Or, D.: CompoNet: learning to generate the unseen by part synthesis and composition. In: ICCV (2019)
Google Scholar
Shao, L., Migimatsu, T., Bohg, J.: Learning to scaffold the development of robotic manipulation skills. In: 2020 International Conference on Robotics and Automation (ICRA). IEEE (2020)
Google Scholar
Suárez-Ruiz, F., Zhou, X., Pham, Q.C.: Can robots assemble an IKEA chair?Sci. Robot. 3(17) (2018)
Google Scholar
Sung, M., Dubrovina, A., Kim, V.G., Guibas, L.: Learning fuzzy set representations of partial shapes on dual embedding spaces. In: Eurographics Symposium on Geometry Processing (2018)
Google Scholar
Sung, M., Su, H., Kim, V.G., Chaudhuri, S., Guibas, L.: ComplementMe: weakly-supervised component suggestions for 3D modeling. In: ACM SIGGRAPH Asia (2017)
Google Scholar
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: ICCV (2017)
Google Scholar
Tejani, A., Kouskouridas, R., Doumanoglou, A., Tang, D., Kim, T.K.: Latent-class Hough forests for 6 DoF object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 40(1), 119–132 (2017)
Article Google Scholar
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301 (2018)
Google Scholar
Thorsley, M., Okouneva, G., Karpynczyk, J.: Stereo vision algorithm for robotic assembly operations. In: Proceedings of the First Canadian Conference on Computer and Robot Vision, pp. 361–366. IEEE (2004)
Google Scholar
Wald, J., Avetisyan, A., Navab, N., Tombari, F., Nießner, M.: Rio: 3D object instance re-localization in changing indoor environments. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7658–7667 (2019)
Google Scholar
Wang, H., Schor, N., Hu, R., Huang, H., Cohen-Or, D., Huang, H.: Global-to-local generative model for 3D shapes. In: ACM SIGGRAPH Asia (2018)
Google Scholar
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)
Google Scholar
Wang, L., Schmidt, B., Givehchi, M., Adamson, G.: Robotic assembly planning and control with enhanced adaptability through function blocks. Int. J. Adv. Manuf. Technol. 77(1), 705–715 (2015)
Google Scholar
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_4
Chapter Google Scholar
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3D mesh models from single RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 52–67 (2018)
Google Scholar
Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: Adaptive O-CNN: a patch-based deep representation of 3D shapes. In: ACM SIGGRAPH Asia (2018)
Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)
Article Google Scholar
Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2Mesh++: multi-view 3D mesh generation via deformation. In: ICCV (2019)
Google Scholar
Wu, R., Zhuang, Y., Xu, K., Zhang, H., Chen, B.: PQ-NET: a generative part seq2seq network for 3D shapes (2019)
Google Scholar
Wu, Z., Wang, X., Lin, D., Lischinski, D., Cohen-Or, D., Huang, H.: SAGNet: structure-aware generative network for 3D-shape modeling. In: ACM SIGGRAPH Asia (2019)
Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robot.: Sci. Syst. (RSS) (2018)
Google Scholar
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019)
Google Scholar
Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. In: NeurIPS (2019)
Google Scholar
Yoon, Y., DeSouza, G.N., Kak, A.C.: Real-time tracking and pose estimation for industrial objects using geometric features. In: 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), vol. 3, pp. 3473–3478. IEEE (2003)
Google Scholar
Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1802–1811 (2017)
Google Scholar
Zeng, A., et al.: Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1386–1383. IEEE (2017)
Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their comments and suggestions. This work was supported by a Vannevar Bush Faculty Fellowship, the grants from the Samsung GRO program, the SAIL Toyota Research Center, and gifts from Autodesk and Adobe.

Author information

Authors and Affiliations

Stanford University, Stanford, USA
Yichen Li, Kaichun Mo, Lin Shao & Leonidas Guibas
Adobe Research, San Jose, USA
Minhyuk Sung

Authors

Yichen Li
View author publications
You can also search for this author in PubMed Google Scholar
Kaichun Mo
View author publications
You can also search for this author in PubMed Google Scholar
Lin Shao
View author publications
You can also search for this author in PubMed Google Scholar
Minhyuk Sung
View author publications
You can also search for this author in PubMed Google Scholar
Leonidas Guibas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yichen Li .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6372 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Mo, K., Shao, L., Sung, M., Guibas, L. (2020). Learning 3D Part Assembly from a Single Image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12351. Springer, Cham. https://doi.org/10.1007/978-3-030-58539-6_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-58539-6_40
Published: 07 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58538-9
Online ISBN: 978-3-030-58539-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics