Advertisement

Filling the Gaps: Predicting Missing Joints of Human Poses Using Denoising Autoencoders

  • Nicolò CarissimiEmail author
  • Paolo Rota
  • Cigdem Beyan
  • Vittorio Murino
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)

Abstract

State of the art pose estimators are able to deal with different challenges present in real-world scenarios, such as varying body appearance, lighting conditions and rare body poses. However, when body parts are severely occluded by objects or other people, the resulting poses might be incomplete, negatively affecting applications where estimating a full body pose is important (e.g. gesture and pose-based behavior analysis). In this work, we propose a method for predicting the missing joints from incomplete human poses. In our model we consider missing joints as noise in the input and we use an autoencoder-based solution to enhance the pose prediction. The method can be easily combined with existing pipelines and, by using only 2D coordinates as input data, the resulting model is small and fast to train, yet powerful enough to learn a robust representation of the low dimensional domain. Finally, results show improved predictions over existing pose estimation algorithms.

Keywords

Human pose estimation Generative methods 

References

  1. 1.
    Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016). https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
  2. 2.
    Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Parzen, E., Tanabe, K., Kitagawa, G. (eds.) Selected Papers of Hirotugu Akaike. Springer, New York (1998).  https://doi.org/10.1007/978-1-4612-1694-0_15CrossRefzbMATHGoogle Scholar
  3. 3.
    Alameda-Pineda, X., et al.: SALSA: a novel dataset for multimodal group behavior analysis. PAMI 38, 1707–1720 (2016)CrossRefGoogle Scholar
  4. 4.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR (2014)Google Scholar
  5. 5.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS (2007)Google Scholar
  6. 6.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: ICCV (2009)Google Scholar
  7. 7.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)Google Scholar
  8. 8.
    Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. arXiv preprint arXiv:1711.07319 (2017)
  9. 9.
    Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J.: Adversarial PoseNet: a structure-aware convolutional network for human pose estimation. In: CVPR (2017)Google Scholar
  10. 10.
    Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: RMPE: regional multi-person pose estimation. In: CVPR (2017)Google Scholar
  11. 11.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  12. 12.
    Feng, W., Kannan, A., Gkioxari, G., Zitnick, C.L.: Learn2Smile: learning non-verbal interaction through observation. In: IROS (2017)Google Scholar
  13. 13.
    Hou, X., Shen, L., Sun, K., Qiu, G.: Deep feature consistent variational autoencoder. In: WACV (2017)Google Scholar
  14. 14.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_3CrossRefGoogle Scholar
  15. 15.
    Iqbal, U., Gall, J.: Multi-person pose estimation with local joint-to-person associations. In: ECCV Workshop (2016). http://arxiv.org/abs/1608.08526
  16. 16.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS (2015)Google Scholar
  17. 17.
    Jain, V., Seung, S.: Natural image denoising with convolutional networks. In: Advances in Neural Information Processing Systems, pp. 769–776 (2009)Google Scholar
  18. 18.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  19. 19.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  20. 20.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  21. 21.
    Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: NIPS (2017)Google Scholar
  22. 22.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  23. 23.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: ICCV (2013)Google Scholar
  24. 24.
    Pishchulin, L., et al.: DeepCut: Joint subset partition and labeling for multi person pose estimation. In: CVPR (2016)Google Scholar
  25. 25.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  26. 26.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)CrossRefGoogle Scholar
  27. 27.
    Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: CVPR (2017)Google Scholar
  28. 28.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014)Google Scholar
  29. 29.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR (2014)Google Scholar
  30. 30.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: ICML (2008)Google Scholar
  31. 31.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  33. 33.
    Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Advances in Neural Information Processing Systems, pp. 341–349 (2012)Google Scholar
  34. 34.
    Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Istituto Italiano di TecnologiaGenoaItaly
  2. 2.Università degli Studi di GenovaGenoaItaly
  3. 3.Università degli Studi di TrentoTrentoItaly
  4. 4.Università degli Studi di VeronaVeronaItaly

Personalised recommendations