Skip to main content

Object Wake-Up: 3D Object Rigging from a Single Image

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13662))

Included in the following conference series:

  • 1845 Accesses

Abstract

Given a single chair image, could we wake it up by reconstructing its 3D shape and skeleton, as well as animating its plausible articulations and motions, similar to that of human modeling? It is a new problem that not only goes beyond image-based object reconstruction but also involves articulated animation of generic objects in 3D, which could give rise to numerous downstream augmented and virtual reality applications. In this paper, we propose an automated approach to tackle the entire process of reconstruct such generic 3D objects, rigging and animation, all from single images. two-stage pipeline has thus been proposed, which specifically contains a multi-head structure to utilize the deep implicit functions for skeleton prediction. Two in-house 3D datasets of generic objects with high-fidelity rendering and annotated skeletons have also been constructed. Empirically our approach demonstrated promising results; when evaluted on the related sub-tasks of 3D reconstruction and skeleton prediction, our results surpass those of the state-of-the-arts by a noticeable margin. Our code and datasets are made publicly available at the dedicated project website.

J. Yang and X. Zuo—Equal contribution.

Project website: https://kulbear.github.io/object-wakeup/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amenta, N., Bern, M.: Surface reconstruction by voronoi filtering. Discrete Comput. Geom. 22(4), 481–504 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  2. Attali, D., Montanvert, A.: Computing and simplifying 2D and 3D continuous skeletons. Comput. Vis. Image Underst. 67(3), 261–273 (1997)

    Article  Google Scholar 

  3. Baran, I., Popović, J.: Automatic rigging and animation of 3d characters. ACM Trans. Graph. (TOG) 26(3), 72 (2007)

    Google Scholar 

  4. Chang, A.X., et al.: Shapenet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)

  5. Chen, T., Zhu, Z., Shamir, A., Hu, S.M., Cohen-Or, D.: 3-sweep: extracting editable objects from a single photo. ACM Trans. Graph. (TOG) 32(6), 1–10 (2013)

    Article  Google Scholar 

  6. Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  7. Cheriton, D., Tarjan, R.E.: Finding minimum spanning trees. SIAM J. Comput. 5(4), 724–742 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  8. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

  9. Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  10. Gkioxari, G., Malik, J., Johnson, J.: Mesh R-CNN. In: IEEE/CVF International Conference on Computer Vision (2019)

    Google Scholar 

  11. Hornung, A., Dekkers, E., Kobbelt, L.: Character animation from 2d pictures and 3d motion data. ACM Trans. Graph. (TOG) 26(1), 1 (2007)

    Google Scholar 

  12. Huang, H., Wu, S., Cohen-Or, D., Gong, M., Zhang, H., Li, G., Chen, B.: L1-medial skeleton of point cloud. ACM Trans. Graph. (TOG) 32(4), 65:1–65:8 (2013)

    Google Scholar 

  13. Kato, H., Ushiku, Y., Harada, T.: Neural 3d mesh renderer. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  14. Kearney, S., Li, W., Parsons, M., Kim, K.I., Cosker, D.: Rgbd-dog: predicting canine pose from rgbd sensors. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8336–8345 (2020)

    Google Scholar 

  15. Kholgade, N., Simon, T., Efros, A., Sheikh, Y.: 3d object manipulation in a single photograph using stock 3D models. ACM Trans. Graph. (TOG) 33(4), 1–12 (2014)

    Article  MATH  Google Scholar 

  16. Kulon, D., Guler, R.A., Kokkinos, I., Bronstein, M.M., Zafeiriou, S.: Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  17. Li, M., Zhang, H.: D2im-net: Learning detail disentangled implicit fields from single images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10246–10255 (2021)

    Google Scholar 

  18. Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3D object reconstruction. In: AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  19. Lin, C.H., Wang, C., Lucey, S.: Sdf-srn: Learning signed distance 3D object reconstruction from static images. In: Advances in Neural Information Processing Systems (2020)

    Google Scholar 

  20. Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: a differentiable renderer for image-based 3d reasoning. In: IEEE/CVF International Conference on Computer Vision (2019)

    Google Scholar 

  21. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  22. Mi, Z., Luo, Y., Tao, W.: Ssrnet: scalable 3D surface reconstruction network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  23. Mo, K., et al.: Partnet: a large-scale benchmark for fine-grained and hierarchical part-level 3d object understanding. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  24. Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Differentiable volumetric rendering: learning implicit 3D representations without 3D supervision. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  25. Pan, J., Han, X., Chen, W., Tang, J., Jia, K.: Deep mesh reconstruction from single rgb images via topology modification networks. In: IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  26. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  27. Peng, S., Niemeyer, M., Mescheder, L., Pollefeys, M., Geiger, A.: Convolutional occupancy networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 523–540. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_31

    Chapter  Google Scholar 

  28. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  29. Riegler, G., Osman Ulusoy, A., Geiger, A.: Octnet: learning deep 3D representations at high resolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  30. Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least squares. ACM Trans. Graph. (TOG) 25(3), 533–540 (2006)

    Google Scholar 

  31. Shi, R., Xue, Z., You, Y., Lu, C.: Skeleton merger: an unsupervised aligned keypoint detector. In: IEEE Conference on Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  32. Shi, Z., Meng, Z., Xing, Y., Ma, Y., Wattenhofer, R.: 3d-retr: end-to-end single and multi-view 3D reconstruction with transformers. In: The British Machine Vision Conference (2021)

    Google Scholar 

  33. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: IEEE International Conference on Computer Vision (2017)

    Google Scholar 

  34. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers and distillation through attention. In: International Conference on Machine Learning, vol. 139, pp. 10347–10357 (2021)

    Google Scholar 

  35. Tretschk, E., Tewari, A., Golyanik, V., Zollhöfer, M., Stoll, C., Theobalt, C.: PatchNets: patch-based generalizable deep implicit 3D shape representations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 293–309. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_18

    Chapter  Google Scholar 

  36. Tulsiani, S., Efros, A.A., Malik, J.: Multi-view consistency as supervisory signal for learning shape and pose prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  37. Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.-G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_4

    Chapter  Google Scholar 

  38. Wang, P.S., Liu, Y., Tong, X.: Deep octree-based cnns with output-guided skip connections for 3D shape and scene completion. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2020)

    Google Scholar 

  39. Wang, W., Yu, R., Huang, Q., Neumann, U.: Sgpn: similarity group proposal network for 3D point cloud instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2569–2578 (2018)

    Google Scholar 

  40. Weng, C.Y., Curless, B., Kemelmacher-Shlizerman, I.: Photo wake-up: 3d character animation from a single photo. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5908–5917 (2019)

    Google Scholar 

  41. Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3d reconstruction. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  42. Xu, X., Wan, L., Liu, X., Wong, T.T., Wang, L., Leung, C.S.: Animating animal motion from still 27(5), 1–8 (2008)

    Google Scholar 

  43. Xu, Z., Zhou, Y., Kalogerakis, E., Landreth, C., Singh, K.: Rignet: neural rigging for articulated characters. ACM Trans. Graph. (TOG) 39(58) (2020)

    Google Scholar 

  44. Xu, Z., Zhou, Y., Kalogerakis, E., Singh, K.: Predicting animation skeletons for 3d articulated models via volumetric nets. In: International Conference on 3D Vision, pp. 298–307 (2019)

    Google Scholar 

  45. Yin, K., Huang, H., Cohen-Or, D., Zhang, H.: P2p-net: bidirectional point displacement net for shape transform. ACM Trans. Graph. (TOG) 37(4), 1–13 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was partly supported by the NSERC Discovery, CFI-JELF and UAHJIC grants. We also thank Priyal Belgamwar for her contribution to the dataset annotation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji Yang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2095 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, J. et al. (2022). Object Wake-Up: 3D Object Rigging from a Single Image. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13662. Springer, Cham. https://doi.org/10.1007/978-3-031-20086-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20086-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20085-4

  • Online ISBN: 978-3-031-20086-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics