International Journal of Computer Vision

, Volume 118, Issue 1, pp 49–64 | Cite as

Poselet-Based Contextual Rescoring for Human Pose Estimation via Pictorial Structures

  • Antonio Hernández-Vela
  • Stan Sclaroff
  • Sergio Escalera
Article

Abstract

In this paper we propose a contextual rescoring method for predicting the position of body parts in a human pose estimation framework. A set of poselets is incorporated in the model, and their detections are used to extract spatial and score-related features relative to other body part hypotheses. A method is proposed for the automatic discovery of a compact subset of poselets that covers the different poses in a set of validation images while maximizing precision. A rescoring mechanism is defined as a set-based boosting classifier that computes a new score for each body joint detection, given its relationship to detections of other body joints and mid-level parts in the image. This new score is incorporated in the pictorial structure model as an additional unary potential, following the recent work of Pishchulin et al. Experiments on two benchmarks show comparable results to Pishchulin et al. while reducing the size of the mid-level representation by an order of magnitude, reducing the execution time by \(68~\%\) accordingly.

Keywords

Contextual rescoring Poselets Human pose estimation 

References

  1. Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1014–1021).Google Scholar
  2. Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In: IEEE 12th international conference on computer vision (pp. 1365–1372).Google Scholar
  3. Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In K. Daniilidis, P. Maragos, & N. Paragios (Eds.), ECCV 2010 Lecture notes in computer science (vol. 6316, pp. 168–181) Berlin: Springer.Google Scholar
  4. Chen, X., & Yuille, A. (2014). Articulated pose estimation with image-dependent preference on pairwise relations. NIPS.Google Scholar
  5. Cinbis, R., & Sclaroff, S. (2012). Contextual object detection using set-based classification. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, & C. Schmid (Eds.), ECCV 2012, Lecture notes in computer science (vol. 7577, pp. 43–57) Berlin: Springer.Google Scholar
  6. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition (CVPR) (vol. 1, pp. 886–893).Google Scholar
  7. Duan, K., Batra, D., & Crandall, D. (2012). A multi-layer composite model for human pose estimation. In: Proceedings of the british machine vision conference. BMVA press (pp. 116.1–116.11).Google Scholar
  8. Eichner, M., & Ferrari, V. (2012). Appearance sharing for collective human pose estimation. In K. Lee, Y. Matsushita, J. Rehg, & Z. Hu (Eds.), ACCV 2012, Lecture notes in computer science (vol. 7724, pp. 138–151). Berlin: Springer.Google Scholar
  9. Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.CrossRefGoogle Scholar
  10. Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008a) Progressive search space reduction for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8)Google Scholar
  11. Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008b). Progressive search space reduction for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).Google Scholar
  12. Hernández-Vela, A., Sclaroff, S., & Escalera, S. (2014). Contextual rescoring for human pose estimation. In: Proceedings of the british machine vision conference (to be published).Google Scholar
  13. Johnson, S., & Everingham, M. (2010). Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the british machine vision conference. BMVA press (pp. 12.1–12.11).Google Scholar
  14. Johnson, S., & Everingham, M. (2011). Learning effective human pose estimation from inaccurate annotation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1465–1472).Google Scholar
  15. Pishchulin, L., Andriluka, M., Gehler, P., & Schiele, B. (2013a). Poselet conditioned pictorial structures. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 588–595).Google Scholar
  16. Pishchulin, L., Andriluka, M., Gehler, P., & Schiele, B. (2013b). Strong appearance and expressive spatial models for human pose estimation. In: IEEE international conference on computer vision (ICCV) (pp. 3487–3494).Google Scholar
  17. Puertas, E., Bautista, M.A., Sanchez, D., Escalera, S., & Pujol, O. (2014). Learning to segment humans by stacking their body parts. In: ECCV 2014 workshops (In press).Google Scholar
  18. Ramakrishna, V., Munoz, D., Hebert, M., Bagnell, J.A., & Sheikh, Y. (2014). Pose machines: Articulated pose estimation via inference machines. In: Computer vision—ECCV 2014. Springer (pp. 33–47).Google Scholar
  19. Ramanan, D. (2007). Learning to parse images of articulated bodies. In: B. Schölkopf, J. Platt, T. Hoffman (Eds.), Advances in neural information processing systems 19, MIT press (pp. 1129–1136).Google Scholar
  20. Sapp, B., Jordan, C., & Taskar, B. (2010). Adaptive pose priors for pictorial structures. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 422–429).Google Scholar
  21. Sun, M., Telaprolu, M., Lee, H., & Savarese, S. (2012). An efficient branch-and-bound algorithm for optimal human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1616–1623).Google Scholar
  22. Tian, T.P, & Sclaroff, S. (2010) Fast globally optimal 2d human detection with loopy graph models. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 81–88).Google Scholar
  23. Tian, Y., Zitnick, C., & Narasimhan, S. (2012). Exploring the spatial hierarchy of mixture models for human pose estimation. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, & C. Schmid (Eds.), ECCV 2012, Lecture notes in computer science (vol. 7576, pp. 256–269) Berlin: Springer.Google Scholar
  24. Tompson, J.J., Jain, A., LeCun, Y., & Bregler, C. (2014). Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems (pp. 1799–1807).Google Scholar
  25. Toshev, A., & Szegedy, C. (2014). Deeppose: Human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1653–1660).Google Scholar
  26. Wang, F., & Li, Y. (2013). Beyond physical connections: Tree models in human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 596–603).Google Scholar
  27. Wang, Y., Tran, D., & Liao, Z. (2011). Learning hierarchical poselets for human parsing. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1705–1712).Google Scholar
  28. Yang, Y., & Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2878–2890.CrossRefGoogle Scholar
  29. Yao, B., & Fei-Fei, L. (2010). Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE conference on computer vision and pattern recognition (CVPR) (pp. 17–24).Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Antonio Hernández-Vela
    • 1
    • 3
  • Stan Sclaroff
    • 2
  • Sergio Escalera
    • 1
    • 3
  1. 1.Department of Applied Mathematics and AnalysisUniversitat de BarcelonaBarcelonaSpain
  2. 2.Department of Computer ScienceBoston UniversityBostonUSA
  3. 3.Computer Vision CenterBarcelonaSpain

Personalised recommendations