On the Estimation of Children’s Poses

  • Giuseppa SciortinoEmail author
  • Giovanni Maria Farinella
  • Sebastiano Battiato
  • Marco Leo
  • Cosimo Distante
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10485)


Deep Learning architectures have obtained significant results for human pose estimation in the last years. Studies of the state of the art usually focus their attention on the estimation of the human pose of adults people depicted in images. The estimation of the pose of child (infants, toddlers, children) is sparsely studied despite it can be very useful in different application domains, such as Assistive Computer Vision (e.g. for early detection of autism spectrum disorder). The monitoring of the pose of a child over time could reveal important information especially during clinical trials. Human pose estimation methods have been benchmarked on a variety of challenging conditions, but studies to highlight performance specifically on children’s poses are still missing. Infants, toddlers and children are not only smaller than adults, but also significantly different in anatomical proportions. Also, in assistive context, the unusual poses assumed by children can be very challenging to infer. The objective of the study in this paper is to compare different state of art approaches for human pose estimation on a benchmark dataset useful to understand their performances when subjects are children. Results reveal that accuracy of the state of art methods drop significantly, opening new challenges for the research community.


Human pose estimation Deep learning methods 


  1. 1.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)Google Scholar
  2. 2.
    Leo, M., Medioni, G., Trivedi, M., Kanade, T., Farinella, G.M.: Computer vision for assistive technologies. Comput. Vis. Image Underst. 154, 1–15 (2017)CrossRefGoogle Scholar
  3. 3.
    Sigal, L., Balan, A.O., Black, M.J.: HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010)CrossRefGoogle Scholar
  4. 4.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)CrossRefGoogle Scholar
  5. 5.
    Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: IEEE International Conference on Computer Vision, pp. 3192–3199 (2013)Google Scholar
  6. 6.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results.
  7. 7.
    Huelke, D.F.: An overview of anatomical considerations of infants and children in the adult world of automobile safety design. Annu. Proc./Assoc. Adv. Automot. Med. 42, 93–113 (1998)Google Scholar
  8. 8.
    Hashemi, J., Spina, T.V., Tepper, M., Esler, A., Morellas, V., Papanikolopoulos, N., Sapiro, G.: Computer vision tools for the non-invasive assessment of autism-related behavioral markers. arXiv preprint arXiv:1210.7014 (2012)
  9. 9.
    Miranda, P., Falcão, A., Udupa, J.: Cloud models: their construction and employment in automatic MRI segmentation of the brain (2010)Google Scholar
  10. 10.
    Aljuaid, H., Mohamad, D.: Child video dataset tool to develop object tracking simulates babysitter vision robot. J. Comput. Sci. 10(2), 296–304 (2014)CrossRefGoogle Scholar
  11. 11.
    Hesse, N., Stachowiak, G., Breuer, T., Arens, M.: Estimating body pose of infants in depth images using random ferns. In: IEEE International Conference on Computer Vision Workshop, pp. 427–435 (2015)Google Scholar
  12. 12.
    Rajagopalan, S., Dhall, A., Goecke, R.: Self-stimulatory behaviours in the wild for autism diagnosis. In: IEEE International Conference on Computer Vision Workshops, pp. 755–761 (2013)Google Scholar
  13. 13.
    Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4733–4742 (2016)Google Scholar
  14. 14.
    Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. arXiv preprint arXiv:1605.02914 (2016)
  15. 15.
    Wei, S., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)Google Scholar
  16. 16.
    Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050 (2016)
  17. 17.
    Tompson, J.J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 648–656 (2015)Google Scholar
  18. 18.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)Google Scholar
  19. 19.
    Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016). doi: 10.1007/978-3-319-46478-7_44 Google Scholar
  20. 20.
    Jain, A., Tompson, J., Andriluka, M., Taylor, G.W., Bregler, C.: Learning human pose estimation features with convolutional networks. arXiv preprint arXiv:1312.7302 (2013)
  21. 21.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)Google Scholar
  22. 22.
    Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., Schiele, B.: Articulated people detection and pose estimation: reshaping the future. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3178–3185 (2012)Google Scholar
  23. 23.
    Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: Using k-poselets for detecting people and localizing their keypoints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3582–3589 (2014)Google Scholar
  24. 24.
    Iqbal, U., Gall, J.: Multi-person pose estimation with local joint-to-person associations. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 627–642. Springer, Cham (2016). doi: 10.1007/978-3-319-48881-3_44 CrossRefGoogle Scholar
  25. 25.
    Eichner, M., Ferrari, V.: We are family: joint pose estimation of multiple persons. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 228–242. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15549-9_17 CrossRefGoogle Scholar
  26. 26.
    Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K.: Towards accurate multi-person pose estimation in the wild. arXiv preprint arXiv:1701.01779 (2017)
  27. 27.
    Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1465–1472 (2011)Google Scholar
  28. 28.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  29. 29.
    Sapp, B., Taskar, B.: MODEC: multimodal decomposable models for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674–3681 (2013)Google Scholar
  30. 30.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). doi: 10.1007/978-3-319-10602-1_48 Google Scholar
  31. 31.
    Bourdev, L., Malik, J.: The Human Annotation Tool.
  32. 32.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35(12), 2878–2890 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Giuseppa Sciortino
    • 2
    Email author
  • Giovanni Maria Farinella
    • 1
  • Sebastiano Battiato
    • 1
  • Marco Leo
    • 2
  • Cosimo Distante
    • 2
  1. 1.IPLAB, Department of Mathematics and Computer ScienceUniversity of CataniaCataniaItaly
  2. 2.ISASI, Institute of Applied Sciences and Intelligent SystemsC.N.R National Research CouncilLecceItaly

Personalised recommendations