Workshop at the European Conference on Computer Vision

ECCV 2014: Computer Vision - ECCV 2014 Workshops pp 685-697 | Cite as

Learning to Segment Humans by Stacking Their Body Parts

  • E. Puertas
  • M. A. Bautista
  • D. Sanchez
  • S. Escalera
  • O. Pujol
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8925)


Human segmentation in still images is a complex task due to the wide range of body poses and drastic changes in environmental conditions. Usually, human body segmentation is treated in a two-stage fashion. First, a human body part detection step is performed, and then, human part detections are used as prior knowledge to be optimized by segmentation strategies. In this paper, we present a two-stage scheme based on Multi-Scale Stacked Sequential Learning (MSSL). We define an extended feature set by stacking a multi-scale decomposition of body part likelihood maps. These likelihood maps are obtained in a first stage by means of a ECOC ensemble of soft body part detectors. In a second stage, contextual relations of part predictions are learnt by a binary classifier, obtaining an accurate body confidence map. The obtained confidence map is fed to a graph cut optimization procedure to obtain the final segmentation. Results show improved segmentation when MSSL is included in the human segmentation pipeline.


Human body segmentation Stacked Sequential Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1014–1021. IEEE (2009)Google Scholar
  2. 2.
    Bautista, M.A., Escalera, S., Baró, X., Radeva, P., Vitriá, J., Pujol, O.: Minimal design of error-correcting output codes. Pattern Recogn. Lett. 33(6), 693–702 (2012)CrossRefGoogle Scholar
  3. 3.
    Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  4. 4.
    Chakraborty, B., Bagdanov, A.D., Gonzalez, J., Roca, X.: Human action recognition using an ensemble of body-part detectors. Expert Systems (2011)Google Scholar
  5. 5.
    Cohen, W.W., de Carvalho, V.R.: Stacked sequential learning. In: Proc. of IJCAI 2005, pp. 671–676 (2005)Google Scholar
  6. 6.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893 (2005)Google Scholar
  7. 7.
    Dantone, M., Gall, J., Leistner, C., van Gool, L.: Human pose estimation using body parts dependent joint regressors. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3041–3048 (June 2013)Google Scholar
  8. 8.
    Dietterich, T., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)MATHGoogle Scholar
  9. 9.
    Dietterich, T.G.: Machine learning for sequential data: A review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 15–30. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  10. 10.
    Escalera, S., Tax, D., Pujol, O., Radeva, P., Duin, R.: Subclass problem-dependent design of error-correcting output codes. PAMI 30(6), 1–14 (2008)CrossRefGoogle Scholar
  11. 11.
    Escalera, S., Pujol, O., Radeva, P.: On the decoding process in ternary error-correcting output codes. PAMI 32, 120–134 (2010)CrossRefGoogle Scholar
  12. 12.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient matching of pictorial structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 66–73. IEEE (2000)Google Scholar
  13. 13.
    Gatta, C., Puertas, E., Pujol, O.: Multi-scale stacked sequential learning. Pattern Recognition 44(10–11), 2414–2426 (2011)CrossRefGoogle Scholar
  14. 14.
    Gkioxari, G., Arbelaez, P., Bourdev, L.D., Malik, J.: Articulated pose estimation using discriminative armlet classifiers. In: CVPR, pp. 3342–3349. IEEE (2013)Google Scholar
  15. 15.
    Hernández-Vela, A., Zlateva, N., Marinov, A., Reyes, M., Radeva, P., Dimov, D., Escalera, S.: Graph cuts optimization for multi-limb human segmentation in depth maps. In: CVPR, pp. 726–732 (2012)Google Scholar
  16. 16.
    Hernández-Vela, A., Reyes, M., Ponce, V., Escalera, S.: Grabcut-based human segmentation in video sequences. Sensors 12(11), 15376–15393 (2012)CrossRefGoogle Scholar
  17. 17.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 588–595. IEEE (2013)Google Scholar
  18. 18.
    Puertas, E., Escalera, S., Pujol, O.: Generalized multi-scale stacked sequential learning for multi-class classification. Pattern Analysis and Applications, 1–15 (2013)Google Scholar
  19. 19.
    Ramanan, D., Forsyth, D., Zisserman, A.: Strike a pose: tracking people by finding stylized poses. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 271–278 (June 2005)Google Scholar
  20. 20.
    Ramanan, D., Forsyth, D., Zisserman, A.: Tracking people by learning their appearance. PAMI 29(1), 65–81 (2007)CrossRefGoogle Scholar
  21. 21.
    Rother, C., Kolmogorov, V., Blake, A.: “grabcut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)CrossRefGoogle Scholar
  22. 22.
    Sánchez, D., Bautista, M.A., Escalera, S.: Hupba 8k+: Dataset and ecoc-graphcut based segmentation of human limbs. Neurocomputing (2014)Google Scholar
  23. 23.
    Sánchez, D., Ortega, J.C., Bautista, M.Á., Escalera, S.: Human body segmentation with multi-limb error-correcting output codes detection and graph cuts optimization. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds.) IbPRIA 2013. LNCS, vol. 7887, pp. 50–58. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  24. 24.
    Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 422–429. IEEE (2010)Google Scholar
  25. 25.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR, p. 3 (2011)Google Scholar
  26. 26.
    Vineet, V., Warrell, J., Ladicky, L., Torr, P.: Human instance segmentation from video using detector-based conditional random fields. In: BMVC (2011)Google Scholar
  27. 27.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR, vol. 1 (2001)Google Scholar
  28. 28.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexiblemixtures-of-parts. In: IEEE Conference on Computer Vision and PatternRecognition, pp. 1385–1392. IEEE (2011)Google Scholar
  29. 29.
    Yu, C.N.J., Joachims, T.: Learning structural svms with latentvariables. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1169–1176. ACM (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • E. Puertas
    • 1
    • 2
  • M. A. Bautista
    • 1
    • 2
  • D. Sanchez
    • 1
    • 2
  • S. Escalera
    • 1
    • 2
  • O. Pujol
    • 1
    • 2
  1. 1.Departament Matemàtica Aplicada i AnàlisiUniversitat de BarcelonaBarcelonaSpain
  2. 2.Computer Vision CenterCampus UABBellaterraSpain

Personalised recommendations