Advertisement

Learning 3D Part Assembly from a Single Image

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12351)

Abstract

Autonomous assembly is a crucial capability for robots in many applications. For this task, several problems such as obstacle avoidance, motion planning, and actuator control have been extensively studied in robotics. However, when it comes to task specification, the space of possibilities remains underexplored. Towards this end, we introduce a novel problem, single-image-guided 3D part assembly, along with a learning-based solution. We study this problem in the setting of furniture assembly from a given complete set of parts and a single image depicting the entire assembled object. Multiple challenges exist in this setting, including handling ambiguity among parts (e.g., slats in a chair back and leg stretchers) and 3D pose prediction for parts and part subassemblies, whether visible or occluded. We address these issues by proposing a two-module pipeline that leverages strong 2D-3D correspondences and assembly-oriented graph message-passing to infer part relationships. In experiments with a PartNet-based synthetic benchmark, we demonstrate the effectiveness of our framework as compared with three baseline approaches (code and data available at https://github.com/AntheaLi/3DPartAssembly).

Keywords

Single-image 3D part assembly Vision for robotic assembly 

Notes

Acknowledgements

We thank the anonymous reviewers for their comments and suggestions. This work was supported by a Vannevar Bush Faculty Fellowship, the grants from the Samsung GRO program, the SAIL Toyota Research Center, and gifts from Autodesk and Adobe.

Supplementary material

504443_1_En_40_MOESM1_ESM.pdf (6.2 mb)
Supplementary material 1 (pdf 6372 KB)

References

  1. 1.
    Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_35CrossRefGoogle Scholar
  2. 2.
    Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., et al.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3364–3372 (2016)Google Scholar
  3. 3.
    Braun, M., Rao, Q., Wang, Y., Flohr, F.: Pose-RCNN: joint object detection and pose estimation using 3D object proposals. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 1546–1551. IEEE (2016)Google Scholar
  4. 4.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository (2015)Google Scholar
  5. 5.
    Chaudhuri, S., Kalogerakis, E., Guibas, L.J., Koltun, V.: Probabilistic reasoning for assembly-based 3D modeling. In: ACM SIGGRAPH (2011)Google Scholar
  6. 6.
    Chaudhuri, S., Koltun, V.: Data-driven suggestions for creativity support in 3D modeling. In: ACM SIGGRAPH Asia (2010)Google Scholar
  7. 7.
    Chen, D., Li, J., Wang, Z., Xu, K.: Learning canonical shape space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11973–11982 (2020)Google Scholar
  8. 8.
    Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR (2019)Google Scholar
  9. 9.
    Choi, C., Taguchi, Y., Tuzel, O., Liu, M., Ramalingam, S.: Voting-based pose estimation for robotic assembly using a 3D sensor. In: 2012 IEEE International Conference on Robotics and Automation, pp. 1724–1731, May 2012Google Scholar
  10. 10.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_38CrossRefGoogle Scholar
  11. 11.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  12. 12.
    Domokos, C., Kato, Z.: Realigning 2D and 3D object fragments without correspondences. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 195–202 (2015)CrossRefGoogle Scholar
  13. 13.
    Dubrovina, A., Xia, F., Achlioptas, P., Shalah, M., Groscot, R., Guibas, L.J.: Composite shape modeling via latent space factorization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8140–8149 (2019)Google Scholar
  14. 14.
    Fan, H., Su, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)Google Scholar
  15. 15.
    Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)Google Scholar
  16. 16.
    Feng, C., Xiao, Y., Willette, A., McGee, W., Kamat, V.R.: Vision guided autonomous robotic assembly and as-built scanning on unstructured construction sites. Autom. Constr. 59, 128–138 (2015)CrossRefGoogle Scholar
  17. 17.
    Gao, L., et al.: SDM-NET: deep generative network for structured deformable mesh. In: ACM SIGGRAPH Asia (2019)Google Scholar
  18. 18.
    Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: AtlasNet: a papier-mâché approach to learning 3D surface generation. In: CVPR (2019)Google Scholar
  19. 19.
    Gupta, S., Arbeláez, P., Girshick, R., Malik, J.: Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4731–4740 (2015)Google Scholar
  20. 20.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2015)Google Scholar
  21. 21.
    Hutchinson, S.A., Kak, A.C.: Extending the classical AI planning paradigm to robotic assembly planning. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 182–189. IEEE (1990)Google Scholar
  22. 22.
    Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: NeurIPS (2018)Google Scholar
  23. 23.
    Izadinia, H., Shan, Q., Seitz, S.M.: IM2CAD. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5134–5143 (2017)Google Scholar
  24. 24.
    Jaiswal, P., Huang, J., Rai, R.: Assembly-based conceptual 3D modeling with unlabeled components using probabilistic factor graph. Comput.-Aided Des. 74, 45–54 (2016)CrossRefGoogle Scholar
  25. 25.
    Jiménez, P.: Survey on assembly sequencing: a combinatorial and geometrical perspective. J. Intell. Manuf. 24(2), 235–250 (2013)CrossRefGoogle Scholar
  26. 26.
    Kalogerakis, E., Chaudhuri, S., Koller, D., Koltun, V.: A probabilistic model of component-based shape synthesis. In: ACM SIGGRAPH (2012)Google Scholar
  27. 27.
    Kaufman, S.G., Wilson, R.H., Jones, R.E., Calton, T.L., Ames, A.L.: The Archimedes 2 mechanical assembly planning system. In: Proceedings of IEEE International Conference on Robotics and Automation, vol. 4, pp. 3361–3368. IEEE (1996)Google Scholar
  28. 28.
    Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)Google Scholar
  29. 29.
    Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2, 83–97 (1955)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Langley, C.S., D’Eleuterio, G.M.T.: Neural network-based pose estimation for fixtureless assembly. In: Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No.01EX515), pp. 248–253, July 2001Google Scholar
  31. 31.
    Levitin, G., Rubinovitz, J., Shnits, B.: A genetic algorithm for robotic assembly line balancing. Eur. J. Oper. Res. 168(3), 811–825 (2006)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Li, J., Niu, C., Xu, K.: Learning part generation and assembly for structure-aware shape synthesis. In: AAAI (2020)Google Scholar
  33. 33.
    Li, J., Xu, K., Chaudhuri, S., Yumer, E., Zhang, H., Guibas, L.: GRASS: generative recursive autoencoders for shape structures. In: ACM SIGGRAPH (2019)Google Scholar
  34. 34.
    Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3D object reconstruction. In: AAAI (2018)Google Scholar
  35. 35.
    Litany, O., Bronstein, A.M., Bronstein, M.M.: Putting the pieces together: regularized multi-part shape matching. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7583, pp. 1–11. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33863-2_1CrossRefGoogle Scholar
  36. 36.
    Litany, O., Rodolà, E., Bronstein, A.M., Bronstein, M.M., Cremers, D.: Non-rigid puzzles. In: Computer Graphics Forum, vol. 35, pp. 135–143. Wiley Online Library (2016)Google Scholar
  37. 37.
    Litvak, Y., Biess, A., Bar-Hillel, A.: Learning pose estimation for high-precision robotic assembly using simulated depth images. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3521–3527, May 2019Google Scholar
  38. 38.
    Máttyus, G., Luo, W., Urtasun, R.: DeepRoadMapper: extracting road topology from aerial images. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3458–3466 (2017)Google Scholar
  39. 39.
    Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: CVPR (2019)Google Scholar
  40. 40.
    Mo, K., et al.: StructureNet: hierarchical graph networks for 3D shape generation. In: ACM SIGGRAPH Asia (2019)Google Scholar
  41. 41.
    Mo, K., et al.: StructEdit: learning structural shape variations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)Google Scholar
  42. 42.
    Mo, K., et al.: PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: CVPR (2019)Google Scholar
  43. 43.
    Nelson, B., Papanikolopoulos, N., Khosla, P.: Visual servoing for robotic assembly. In: Visual Servoing: Real-Time Control of Robot Manipulators Based on Visual Sensory Feedback, pp. 139–164. World Scientific (1993)Google Scholar
  44. 44.
    Niu, C., Li, J., Xu, K.: Im2Struct: recovering 3D shape structure from a single RGB image. In: CVPR (2018)Google Scholar
  45. 45.
    Papon, J., Schoeler, M.: Semantic pose using deep networks trained on synthetic RGB-D. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 774–782 (2015)Google Scholar
  46. 46.
    Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: Learning continuous signed distance functions for shape representation. In: CVPR (2019)Google Scholar
  47. 47.
    Pearson, K.: On lines and planes of closest fit to systems of points in space. Phil. Mag. 2, 559–572 (1901)CrossRefGoogle Scholar
  48. 48.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)Google Scholar
  49. 49.
    Rad, M., Lepetit, V.: Bb8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3836 (2017)Google Scholar
  50. 50.
    Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: CVPR (2018)Google Scholar
  51. 51.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  52. 52.
    Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: ICCV (2019)Google Scholar
  53. 53.
    Schor, N., Katzir, O., Zhang, H., Cohen-Or, D.: CompoNet: learning to generate the unseen by part synthesis and composition. In: ICCV (2019)Google Scholar
  54. 54.
    Shao, L., Migimatsu, T., Bohg, J.: Learning to scaffold the development of robotic manipulation skills. In: 2020 International Conference on Robotics and Automation (ICRA). IEEE (2020)Google Scholar
  55. 55.
    Suárez-Ruiz, F., Zhou, X., Pham, Q.C.: Can robots assemble an IKEA chair?Sci. Robot. 3(17) (2018)Google Scholar
  56. 56.
    Sung, M., Dubrovina, A., Kim, V.G., Guibas, L.: Learning fuzzy set representations of partial shapes on dual embedding spaces. In: Eurographics Symposium on Geometry Processing (2018)Google Scholar
  57. 57.
    Sung, M., Su, H., Kim, V.G., Chaudhuri, S., Guibas, L.: ComplementMe: weakly-supervised component suggestions for 3D modeling. In: ACM SIGGRAPH Asia (2017)Google Scholar
  58. 58.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: ICCV (2017)Google Scholar
  59. 59.
    Tejani, A., Kouskouridas, R., Doumanoglou, A., Tang, D., Kim, T.K.: Latent-class Hough forests for 6 DoF object pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 40(1), 119–132 (2017)CrossRefGoogle Scholar
  60. 60.
    Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301 (2018)Google Scholar
  61. 61.
    Thorsley, M., Okouneva, G., Karpynczyk, J.: Stereo vision algorithm for robotic assembly operations. In: Proceedings of the First Canadian Conference on Computer and Robot Vision, pp. 361–366. IEEE (2004)Google Scholar
  62. 62.
    Wald, J., Avetisyan, A., Navab, N., Tombari, F., Nießner, M.: Rio: 3D object instance re-localization in changing indoor environments. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7658–7667 (2019)Google Scholar
  63. 63.
    Wang, H., Schor, N., Hu, R., Huang, H., Cohen-Or, D., Huang, H.: Global-to-local generative model for 3D shapes. In: ACM SIGGRAPH Asia (2018)Google Scholar
  64. 64.
    Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)Google Scholar
  65. 65.
    Wang, L., Schmidt, B., Givehchi, M., Adamson, G.: Robotic assembly planning and control with enhanced adaptability through function blocks. Int. J. Adv. Manuf. Technol. 77(1), 705–715 (2015)Google Scholar
  66. 66.
    Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 55–71. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01252-6_4CrossRefGoogle Scholar
  67. 67.
    Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2mesh: generating 3D mesh models from single RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 52–67 (2018)Google Scholar
  68. 68.
    Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: Adaptive O-CNN: a patch-based deep representation of 3D shapes. In: ACM SIGGRAPH Asia (2018)Google Scholar
  69. 69.
    Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)CrossRefGoogle Scholar
  70. 70.
    Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2Mesh++: multi-view 3D mesh generation via deformation. In: ICCV (2019)Google Scholar
  71. 71.
    Wu, R., Zhuang, Y., Xu, K., Zhang, H., Chen, B.: PQ-NET: a generative part seq2seq network for 3D shapes (2019)Google Scholar
  72. 72.
    Wu, Z., Wang, X., Lin, D., Lischinski, D., Cohen-Or, D., Huang, H.: SAGNet: structure-aware generative network for 3D-shape modeling. In: ACM SIGGRAPH Asia (2019)Google Scholar
  73. 73.
    Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robot.: Sci. Syst. (RSS) (2018)Google Scholar
  74. 74.
    Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019)Google Scholar
  75. 75.
    Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. In: NeurIPS (2019)Google Scholar
  76. 76.
    Yoon, Y., DeSouza, G.N., Kak, A.C.: Real-time tracking and pose estimation for industrial objects using geometric features. In: 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), vol. 3, pp. 3473–3478. IEEE (2003)Google Scholar
  77. 77.
    Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1802–1811 (2017)Google Scholar
  78. 78.
    Zeng, A., et al.: Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1386–1383. IEEE (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Stanford UniversityStanfordUSA
  2. 2.Adobe ResearchSan JoseUSA

Personalised recommendations