Advertisement

Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop

Conference paper
  • 619 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12356)

Abstract

We introduce an automatic, end-to-end method for recovering the 3D pose and shape of dogs from monocular internet images. The large variation in shape between dog breeds, significant occlusion and low quality of internet images makes this a challenging problem. We learn a richer prior over shapes than previous work, which helps regularize parameter estimation. We demonstrate results on the Stanford Dog Dataset, an ‘in the wild’ dataset of 20,580 dog images for which we have collected 2D joint and silhouette annotations to split for training and evaluation. In order to capture the large shape variety of dogs, we show that the natural variation in the 2D dataset is enough to learn a detailed 3D prior through expectation maximization (EM). As a by-product of training, we generate a new parameterized model (including limb scaling) SMBLD which we release alongside our new annotation dataset StanfordExtra to the research community.

Notes

Acknowlegements

The authors would like to thank the GSK AI team for providing access to their GPU cluster, Michael Sutcliffe, Matthew Allen, Thomas Roddick and Peter Fisher for useful technical discussions, and the GSK TDI team for project sponsorship.

Supplementary material

504452_1_En_12_MOESM1_ESM.pdf (252 kb)
Supplementary material 1 (pdf 252 KB)

Supplementary material 2 (mp4 72013 KB)

References

  1. 1.
    Agudo, A., Pijoan, M., Moreno-Noguer, F.: Image collection pop-up: 3D reconstruction and clustering of rigid and non-rigid categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  2. 2.
    Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1175–1186 (2019)Google Scholar
  3. 3.
    American Pet Products Association: 2019–2020 APPA National Pet Owners Survey (2020). http://www.americanpetproducts.org
  4. 4.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  5. 5.
    Biggs, B., Roddick, T., Fitzgibbon, A., Cipolla, R.: Creatures great and SMAL: recovering the shape and motion of animals from video. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11365, pp. 3–19. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-20873-8_1CrossRefGoogle Scholar
  6. 6.
    Cao, J., Tang, H., Fang, H., Shen, X., Tai, Y., Lu, C.: Cross-domain adaptation for animal pose estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9497–9506 (2019)Google Scholar
  7. 7.
    Cashman, T.J., Fitzgibbon, A.W.: What shape are dolphins? Building 3D morphable models from 2D images. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 232–244 (2013)CrossRefGoogle Scholar
  8. 8.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. CoRR abs/1606.00915 (2016)Google Scholar
  9. 9.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  11. 11.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)CrossRefGoogle Scholar
  12. 12.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (2010).  https://doi.org/10.5244/C.24.12
  13. 13.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  14. 14.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the CVPR (2018)Google Scholar
  15. 15.
    Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_23CrossRefGoogle Scholar
  16. 16.
    Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  17. 17.
    Kearney, S., Li, W., Parsons, M., Kim, K.I., Cosker, D.: RGBD-dog: predicting canine pose from RGBD sensors. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  18. 18.
    Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)Google Scholar
  19. 19.
    Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO (2011)Google Scholar
  20. 20.
    Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2252–2261 (2019)Google Scholar
  21. 21.
    Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the CVPR (2019)Google Scholar
  22. 22.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48. https://www.microsoft.com/en-us/research/publication/microsoft-coco-common-objects-in-context/CrossRefGoogle Scholar
  23. 23.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 248 (2015)CrossRefGoogle Scholar
  24. 24.
    von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 614–631. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_37CrossRefGoogle Scholar
  25. 25.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  26. 26.
    Novotny, D., Ravi, N., Graham, B., Neverova, N., Vedaldi, A.: C3DPO: canonical 3D pose networks for non-rigid structure from motion. In: Proceedings of the ICCV (2019)Google Scholar
  27. 27.
    Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the CVPR (2018)Google Scholar
  28. 28.
    Probst, T., Paudel, D.P., Chhatkuli, A., Van Gool, L.: Incremental non-rigid structure-from-motion with unknown focal length. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 776–793. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01261-8_46CrossRefGoogle Scholar
  29. 29.
    Reinert, B., Ritschel, T., Seidel, H.P.: Animated 3D creatures from single-view video by skeletal sketching. In: Proceedings of the Graphics Interface (2016)Google Scholar
  30. 30.
    Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. arXiv preprint arXiv:1905.05172 (2019)
  31. 31.
    Vicente, S., Agapito, L.: Balloon shapes: reconstructing and deforming objects with volume from images. In: 2013 International Conference on 3D Vision - 3DV 2013, pp. 223–230 (2013)Google Scholar
  32. 32.
    Zuffi, S., Kanazawa, A., Berger-Wolf, T., Black, M.J.: Three-D Safari: learning to estimate zebra pose, shape, and texture from images “in the wild”. In: The IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar
  33. 33.
    Zuffi, S., Kanazawa, A., Black, M.J.: Lions and tigers and bears: capturing non-rigid, 3D, articulated shape from images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society (2018)Google Scholar
  34. 34.
    Zuffi, S., Kanazawa, A., Jacobs, D., Black, M.J.: 3D menagerie: modeling the 3D shape and pose of animals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of EngineeringUniversity of CambridgeCambridgeUK
  2. 2.MicrosoftCambridgeUK

Personalised recommendations