Advertisement

3D Bird Reconstruction: A Dataset, Model, and Shape Recovery from a Single View

Conference paper
  • 547 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12363)

Abstract

Automated capture of animal pose is transforming how we study neuroscience and social behavior. Movements carry important social cues, but current methods are not able to robustly estimate pose and shape of animals, particularly for social animals such as birds, which are often occluded by each other and objects in the environment. To address this problem, we first introduce a model and multi-view optimization approach, which we use to capture the unique shape and pose space displayed by live birds. We then introduce a pipeline and experiments for keypoint, mask, pose, and shape regression that recovers accurate avian postures from single views. Finally, we provide extensive multi-view keypoint and mask annotations collected from a group of 15 social birds housed together in an outdoor aviary. The project website with videos, results, code, mesh model, and the Penn Aviary Dataset can be found at https://marcbadger.github.io/avian-mesh.

Keywords

Pose estimation Shape estimation Birds Animals Dataset 

Notes

Acknowledgements

We thank the diligent annotators in the Schmidt Lab, Kenneth Chaney for compute resources, and Stephen Phillips for helpful discussions. We gratefully acknowledge support through the following grants: NSF-IOS-1557499, NSF-IIS-1703319, NSF MRI 1626008, NSF TRIPODS 1934960.

Supplementary material

504473_1_En_1_MOESM1_ESM.pdf (7 mb)
Supplementary material 1 (pdf 7196 KB)

References

  1. 1.
    Anderson, D.J., Perona, P.: Toward a science of computational ethology. Neuron 84(1), 18–31 (2014).  https://doi.org/10.1016/j.neuron.2014.09.005. http://www.sciencedirect.com/science/article/pii/S0896627314007934
  2. 2.
    Baillie, K.U., Spitzer, S., Crucius, D.: ‘Smart aviary’ poised to break new ground in behavioral research (2019). https://penntoday.upenn.edu/news/smart-aviary-poised-break-new-ground-behavioral-research
  3. 3.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_34CrossRefGoogle Scholar
  4. 4.
    Bogo, F., Romero, J., Loper, M., Black, M.J.: FAUST: dataset and evaluation for 3D mesh registration. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Piscataway, NJ, USA. IEEE, June 2014Google Scholar
  5. 5.
    Breslav, M.: 3D pose estimation of flying animals in multi-view video datasets. Ph.D. thesis, Boston University (2016)Google Scholar
  6. 6.
    Cashman, T., Fitzgibbon, A.: What shape are dolphins? Building 3D morphable models from 2D images. IEEE Trans. Pattern Anal. Mach. Intell. 35, 232 (2013). https://www.microsoft.com/en-us/research/publication/shape-dolphins-building-3d-morphable-models-2d-images/
  7. 7.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005).  https://doi.org/10.1023/B:VISI.0000042934.15159.49
  8. 8.
    Fontaine, E.I., Zabala, F., Dickinson, M.H., Burdick, J.W.: Wing and body motion during flight initiation in drosophila revealed by automated visual tracking. J. Exp. Biol. 212(9), 1307–1323 (2009).  https://doi.org/10.1242/jeb.025379. https://jeb.biologists.org/content/212/9/1307
  9. 9.
    Geman, S., McClure, D.: Statistical methods for tomographic image reconstruction. Bull. Int. Stat. Inst. LI I(4), 5–21 (1987)MathSciNetGoogle Scholar
  10. 10.
    Graving, J.M., et al.: DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8, e47994 (2019)Google Scholar
  11. 11.
    Günel, S., Rhodin, H., Morales, D., Campagnolo, J., Ramdya, P., Fua, P.: DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila. eLife 8, e48571 (2019)Google Scholar
  12. 12.
    Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: 2017 International Conference on 3D Vision (3DV), pp. 421–430 (2017)Google Scholar
  13. 13.
    Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3334–3342 (2015)Google Scholar
  14. 14.
    Joo, H., Simon, T., Cikara, M., Sheikh, Y.: Towards social artificial intelligence: nonverbal social signal prediction in a triadic interaction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10865–10875 (2019)Google Scholar
  15. 15.
    Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  16. 16.
    Kanazawa, A., Kovalsky, S., Basri, R., Jacobs, D.: Learning 3D deformation of animals from 2D images. Comput. Graph. Forum 35(2), 365–374 (2016).  https://doi.org/10.1111/cgf.12838. https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.12838
  17. 17.
    Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_23CrossRefGoogle Scholar
  18. 18.
    Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2018)Google Scholar
  19. 19.
    Kingma, D.P., Ba, J.L.: Adam : a method for stochastic optimization (2014)Google Scholar
  20. 20.
    Kolotouros, N., Pavlakos, G., Black, M., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2252–2261 (2019)Google Scholar
  21. 21.
    Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4496–4505 (2019)Google Scholar
  22. 22.
    Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36(6) (2017).  https://doi.org/10.1145/3130800.3130813
  23. 23.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  24. 24.
    Liu, J., Belhumeur, P.N.: Bird part localization using exemplar-based models with enforced pose and subcategory consistency. In: 2013 IEEE International Conference on Computer Vision, pp. 2520–2527 (2013)Google Scholar
  25. 25.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34(6), 248:1–248:16 (2015)Google Scholar
  26. 26.
    Mathis, A., et al.: DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21(9), 1281–1289 (2018)CrossRefGoogle Scholar
  27. 27.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  28. 28.
    Ntouskos, V., et al.: Component-wise modeling of articulated objects. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2327–2335 (2015)Google Scholar
  29. 29.
    Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10967–10977 (2019)Google Scholar
  30. 30.
    Pavlakos, G., Kolotouros, N., Daniilidis, K.: Texturepose: supervising human mesh estimation with texture consistency. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 803–812 (2019)Google Scholar
  31. 31.
    Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)Google Scholar
  32. 32.
    Pereira, T.D., et al.: Fast animal pose estimation using deep neural networks. Nat. Methods 16, 117–125 (2019)CrossRefGoogle Scholar
  33. 33.
    Pfrommer, B., Daniilidis, K.: Tagslam: robust slam with fiducial markers (2019)Google Scholar
  34. 34.
    Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36(6) (2017).  https://doi.org/10.1145/3130800.3130883
  35. 35.
    Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5686–5696 (2019)Google Scholar
  36. 36.
    Vicente, S., Agapito, L.: Balloon shapes: reconstructing and deforming objects with volume from images. In: 2013 International Conference on 3D Vision - 3DV 2013, pp. 223–230 (2013)Google Scholar
  37. 37.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. Technical report, CNS-TR-2011-001, California Institute of Technology (2011)Google Scholar
  38. 38.
    West, M.J., King, A.P.: Female visual displays affect the development of male song in the cowbird. Nature 334, 224–246 (1988)CrossRefGoogle Scholar
  39. 39.
    Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5738–5746 (2019)Google Scholar
  40. 40.
    Zuffi, S., Kanazawa, A., Berger-Wolf, T., Black, M.: Three-D safari: learning to estimate zebra pose, shape, and texture from images “in the wild”. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5358–5367 (2019)Google Scholar
  41. 41.
    Zuffi, S., Kanazawa, A., Black, M.J.: Lions and tigers and bears: capturing non-rigid, 3D, articulated shape from images. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3955–3963 (2018)Google Scholar
  42. 42.
    Zuffi, S., Kanazawa, A., Jacobs, D.W., Black, M.J.: 3D menagerie: modeling the 3D shape and pose of animals. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5524–5532 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations