Advertisement

Machine Vision and Applications

, Volume 27, Issue 2, pp 157–174 | Cite as

Stereo Pictorial Structure for 2D articulated human pose estimation

  • Manuel I. López-Quintero
  • Manuel J. Marín-Jiménez
  • Rafael Muñoz-Salinas
  • Francisco J. Madrid-Cuevas
  • Rafael Medina-Carnicer
Original Paper

Abstract

In this paper, we consider the problem of 2D human pose estimation on stereo image pairs. In particular, we aim at estimating the location, orientation and scale of upper-body parts of people detected in stereo image pairs from realistic stereo videos that can be found in the Internet. To address this task, we propose a novel pictorial structure model to exploit the stereo information included in such stereo image pairs: the Stereo Pictorial Structure (SPS). To validate our proposed model, we contribute a new annotated dataset of stereo image pairs, the Stereo Human Pose Estimation Dataset (SHPED), obtained from YouTube stereoscopic video sequences, depicting people in challenging poses and diverse indoor and outdoor scenarios. The experimental results on SHPED indicates that SPS improves on state-of-the-art monocular models thanks to the appropriate use of the stereo information.

Keywords

Body Part Appearance Model Detection Window Stereo Pair Unary Potential 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This work was partially supported by the Research Projects TIN2012-32952 and BROCA, both financed by the Spanish Ministry of Science and Technology and FEDER. We also thank the invaluable help of Marcin Eichner during the implementation of his Direct-PCE method.

References

  1. 1.
    Agarwal, A., Triggs, B.: Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2006)CrossRefGoogle Scholar
  2. 2.
    Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1475–1490 (2004)CrossRefGoogle Scholar
  3. 3.
    Amin, S., Andriluka, M., Rohrbach, M., Schiele, B.: Multi-view pictorial structures for 3D human pose estimation. In: Proceedings of the British Machine Vision Conference. Bristol, UK (2013)Google Scholar
  4. 4.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1014–1021 (2009)Google Scholar
  5. 5.
    Andriluka, M., Roth, S., Schiele, B.: Monocular 3d pose estimation and tracking by detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 623–630 (2010)Google Scholar
  6. 6.
    Andriluka, M., Roth, S., Schiele, B.: Discriminative appearance models for pictorial structures. Int. J. Comput. Vision 99(3) (2012)Google Scholar
  7. 7.
    Ayvaci, A., Raptis, M., Soatto, S.: Sparse occlusion detection with optical flow. Int. J. Comput. Vision 97(3), 322–338 (2012)CrossRefMathSciNetzbMATHGoogle Scholar
  8. 8.
    Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Proceedings of the International Conference on Computer Vision, pp. 1092–1099. IEEE (2011)Google Scholar
  9. 9.
    Bay, H., Ess, A., Tuytelaars, T., van Gool, L.: SURF: Speeded up robust features. Computer Vision and Image Understanding, pp. 346–359 (2008)Google Scholar
  10. 10.
    Buehler, P., Everingham, M., Huttenlocher, D.P., Zisserman, A.: Long term arm and hand tracking for continuous sign language TV broadcasts. In: Proceedings of the British Machine Vision Conference, pp. 110.1–110.10 (2008)Google Scholar
  11. 11.
    Burenius, M., Sullivan, J., Carlsson, S.: 3d pictorial structures for multiple view articulated pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3618–3625 (2013)Google Scholar
  12. 12.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Proc. IEEE Conf. Comput. Vision Pattern Recogn. 1, 886–893 (2005)Google Scholar
  13. 13.
    Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: Proceedings of the British Machine Vision Conference, pp. 3.1–3.11 (2009)Google Scholar
  14. 14.
    Eichner, M., Ferrari, V.: We are family: joint pose estimation of multiple persons. In: Proceedings of the European Conference on Computer Vision, pp. 228–242 (2010)Google Scholar
  15. 15.
    Eichner, M., Ferrari, V.: Human pose co-estimation and applications. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2282–2288 (2012)CrossRefGoogle Scholar
  16. 16.
    Eichner, M., Marín-Jiménez, M.J., Zisserman, A., Ferrari, V.: 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. Int. J. Comput. Vision 99(2), 190–214 (2012)CrossRefGoogle Scholar
  17. 17.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)CrossRefGoogle Scholar
  18. 18.
    Feltzenswalb, P., Hutenlocher, D.: Pictorial structures for object recognition. Int. J. Comput. Vision 61, 55–79 (2005)CrossRefGoogle Scholar
  19. 19.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9) (2010)Google Scholar
  20. 20.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  21. 21.
    Ferrari, V., Marin, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  22. 22.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Pose search: retrieving people using their pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2009)Google Scholar
  23. 23.
    Guan, P., Weiss, A., Balan, A., Black, M.J.: Estimating human shape and pose from a single image. In: Proceedings of the International Conference on Computer Vision, pp. 1381–1388 (2009)Google Scholar
  24. 24.
    Guo, F., Qian, G.: Human pose inference from stereo cameras. In: IEEE Workshop on Applications of Computer Vision, pp. 37–37 (2007)Google Scholar
  25. 25.
    Hartley, R.I., Zisserman, A.: Multiple view geometry in computer vision, 2nd edn. Cambridge University Press (2004) (ISBN: 0521540518)Google Scholar
  26. 26.
    Johnson, S., Everingham, M.: Combining discriminative appearance and segmentation cues for articulated human pose estimation. In: ICCV Workshops: machine learning for vision-based motion analysis (2009)Google Scholar
  27. 27.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference, pp. 11–12.1 (2010)Google Scholar
  28. 28.
    Kazemi, V., Burenius, M., Azizpour, H., Sullivan, J.: Multi-view body part recognition with random forests. In: Proceedings of the British Machine Vision Conference, pp. 48.1–48.11 (2013)Google Scholar
  29. 29.
    Konolige, K.: Small vision systems: hardware and implementation. In: Shirai, Y., Hirose, S. (eds.) Robot. Res., pp. 203–212. Springer, London (1998)CrossRefGoogle Scholar
  30. 30.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning, pp. 282–289 (2001)Google Scholar
  31. 31.
    Lallemand, J., Szczot, M., Ilic, S.: Human pose estimation in stereo images. In: Articulated motion and deformable objects, pp. 10–19 (2014)Google Scholar
  32. 32.
    Lan, X., Huttenlocher, D.: Beyond trees: Common-factor models for 2D human pose recovery. Proc. Int. Confer. Comput. Vision 1, 470–477 (2005)Google Scholar
  33. 33.
    Lee, M., Cohen, I.: Human upper body pose estimation in static images. In: Proceedings of the European Conference on Computer Vision, pp. 126–138 (2004)Google Scholar
  34. 34.
    Mori, G., Ren, X., Efros, A., Malik, J.: Recovering human body configurations: combining segmentation and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 326–333 (2004)Google Scholar
  35. 35.
    Pérez-Sala, X., Escalera, S., Angulo, C., González, J.: A survey on model based approaches for 2D and 3D visual human pose recovery. Sensors pp. 4189–4210 (2014)Google Scholar
  36. 36.
    Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.: Metric regression forests for human pose estimation. In: Proceedings of the British Machine Vision Conference, pp. 4.1–4.11 (2013)Google Scholar
  37. 37.
    Ramanan, D.: Learning to parse images of articulated bodies. In: Advances in Neural Information Processing Systems, pp. 1129–1136. MIT Press (2006)Google Scholar
  38. 38.
    Rogez, G., Rihan, J., Orrite-Uruñuela, C., Torr, P.H.: Fast human pose detection using randomized hierarchical cascades of rejectors. Int. J. Comput. Vision 99(1), 25–52 (2012)CrossRefGoogle Scholar
  39. 39.
    Rother, C., Kolmogorov, V., Blake, A.: Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Trans Gr. 23, 309–314 (2004)CrossRefGoogle Scholar
  40. 40.
    Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 422–429 (2010)Google Scholar
  41. 41.
    Sapp, B., Toshev, A., Taskar, B.: Cascaded models for articulated pose estimation. In: Proceedings of the European Conference on Computer Vision, pp. 406–420 (2010)Google Scholar
  42. 42.
    Schwarz, L.A., Mkhitaryan, A., Mateus, D., Navab, N.: Human skeleton tracking from depth data using geodesic distances and optical flow. Image Vis. Comput. 30(3), 217–226 (2012)CrossRefGoogle Scholar
  43. 43.
    Sheasby, G., Valentin, J., Crook, N., Torr, P.: A robust stereo prior for human segmentation. In: Proceedings of the Asian Conference on Computer Vision, pp. 94–107 (2012)Google Scholar
  44. 44.
    Sheasby, G., Warrell, J., Zhang, Y., Crook, N., Torr, P.H.: Simultaneous human segmentation, depth and pose estimation via dual decomposition. In: British Machine Vision Conference, Student Workshop, BMVW (2012)Google Scholar
  45. 45.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1297–1304 (2011)Google Scholar
  46. 46.
    Sigal, L., Black, M.: Measure locally, reason globally: occlusion-sensitive articulated pose estimation. Proc. IEEE Conf. Comput. Vision Pattern Recogn. 2, 2041–2048 (2006)Google Scholar
  47. 47.
    Sigal, L., Isard, M., Haussecker, H., Black, M.J.: Loose-limbed people: estimating 3d human pose and motion using non-parametric belief propagation. Int. J. Comput. Vision 98(1), 15–48 (2012)CrossRefMathSciNetzbMATHGoogle Scholar
  48. 48.
    Smolic, A., Mueller, K., Merkle, P., Kauff, P., Wiegand, T.: An overview of available and emerging 3D video formats and depth enhanced stereo as efficient generic solution. In: Picture Coding Symposium, pp. 1–4. IEEE (2009)Google Scholar
  49. 49.
    Sun, M., Kohli, P., Shotton, J.: Conditional regression forests for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3394–3401 (2012)Google Scholar
  50. 50.
    Thang, N.D.: Human pose and activity recognition from stereo images using probabilistic parametric inference. Ph.D. thesis, Kyung Hee University, Department of Computer Engineering (2011)Google Scholar
  51. 51.
    Tian, Y., Sigal, L., la Torre, F.D., Jia, Y.: Canonical locality preserving latent variable model for discriminative pose inference. Image Vis. Comput. 31(3), 223–230 (2013)CrossRefGoogle Scholar
  52. 52.
  53. 53.
  54. 54.
  55. 55.
    Yang, H.D., Lee, S.W.: Reconstruction of 3D human body pose from stereo image sequences based on top-down learning. Pattern Recogn. 40(11), 3120–3131 (2007)CrossRefzbMATHGoogle Scholar
  56. 56.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1385–1392 (2011)Google Scholar
  57. 57.
    Yao, A., Gall, J., Van Gool, L.: Coupled action recognition and pose estimation from multiple views. Int. J. Comput. Vision 100(1), 16–37 (2012)CrossRefzbMATHGoogle Scholar
  58. 58.
    Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M.: Accurate 3d pose estimation from a single depth image. In: Proceedings of the International Conference on Computer Vision, pp. 731–738. IEEE (2011)Google Scholar
  59. 59.
    Yeguas-Bolivar, E., Munoz-Salinas, R., Medina-Carnicer, R., Carmona-Poyato, A.: Comparing evolutionary algorithms and particle filters for markerless human motion capture. Appl. Soft Comput. 17, 153–166 (2014)CrossRefGoogle Scholar
  60. 60.
    Zhu, Y., Dariush, B., Fujimura, K.: Controlled human pose estimation from depth image streams. In: IEEE Computer Vision and Pattern Recognition Workshops (2008)Google Scholar
  61. 61.
    Zhu, Y., Fujimura, K.: Constrained optimization for human pose estimation from depth sequences. In: Proceedings of the Asian Conference on Computer Vision, pp. 408–418 (2007)Google Scholar
  62. 62.
    Zolfaghari, M., Jourabloo, A., Gozlou, S., Pedrood, B., Manzuri-Shalmani, M.: 3D human pose estimation from image using couple sparse coding. Mach. Vis. Appl. 25, 1489–1499 (2014)CrossRefGoogle Scholar
  63. 63.
    Zuffi, S., Freifeld, O., Black, M.J.: From pictorial structures to deformable structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3546–3553 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Department of Computing and Numerical Analysis, Maimonides Institute for Biomedical Research (IMIBIC)University of CórdobaCórdobaSpain

Personalised recommendations