Learning to Look at Humans

  • Thomas Walther
  • Rolf P. Würtz
Part of the Autonomic Systems book series (ASYS, volume 1)


The problem of learning a generalisable model of the visual appearance of humans from video data is of major importance for computing systems interacting naturally with their users and other humans populating their environment. We propose a step towards automatic behaviour understanding by integrating principles of Organic Computing into the posture estimation cycle, thereby relegating the need for human intervention while simultaneously raising the level of system autonomy. The system extracts coherent motion from moving upper bodies and autonomously decides about limbs and their possible spatial relationships. The models from many videos are integrated into meta-models, which show good generalisation to different individuals, backgrounds, and attire. These models even allow robust interpretation of single video frames, where all temporal continuity is missing.


Image understanding Autonomous learning Organic computing Pose estimation Articulated model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alavi, E.Y., Chartrand, G., Oellermann, O.R., Schwenk, A.J. (eds.): Graph Theory, Combinatorics and Applications, vol. 2, pp. 871–898. Wiley, New York (1991) Google Scholar
  2. 2.
    Atev, S., Masoud, O., Papanikolopoulos, N.: Learning traffic patterns at intersections by spectral clustering of motion trajectories. In: Proc. Intl. Conf. on Intelligent Robots and Systems, pp. 4851–4856 (2006) CrossRefGoogle Scholar
  3. 3.
    Auffarth, B.: Spectral graph clustering. Course report, Universitat de Barcelona, Barcelona, January 2007 Google Scholar
  4. 4.
    Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992) CrossRefGoogle Scholar
  5. 5.
    Boykov, Y.Y., Jolly, M.-P.: Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In: Proc. ICCV, Vancouver, Canada, vol. 1, pp. 105–112 (2001) Google Scholar
  6. 6.
    Christoudias, C., Georgescu, B., Meer, P.: Synergism in low-level vision. In: Proc. ICPR, Quebec City, Canada, vol. 4, pp. 150–155 (2002) Google Scholar
  7. 7.
    Daugman, J.G.: Complete discrete 2-d Gabor transforms by neural networks for image analysis and compression. IEEE Trans. Acoust. Speech Signal Process. 36(7), 1169–1179 (1988) zbMATHCrossRefGoogle Scholar
  8. 8.
    Deng, Y., Manjunath, B.: Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. Pattern Anal. Mach. Intell. 23(8), 800–810 (2001) CrossRefGoogle Scholar
  9. 9.
    Eriksen, R.D.: Image processing library 98 (2006).
  10. 10.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient matching of pictorial structures. In: Proc. CVPR, vol. 2, pp. 66–73 (2000) Google Scholar
  11. 11.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005) CrossRefGoogle Scholar
  12. 12.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Proc. CVPR, pp. 976–983 (2008) Google Scholar
  13. 13.
    Kameda, Y., Minoh, M.: A human motion estimation method using 3-successive video frames. In: International Conference on Virtual Systems and Multimedia, Gifu, Japan (1996) Google Scholar
  14. 14.
    Krahnstoever, N., Yeasin, M., Sharma, R.: Automatic acquisition and initialization of articulated models. Mach. Vis. Appl. 14(4), 218–228 (2003) CrossRefGoogle Scholar
  15. 15.
    Kumar, M.P., Torr, P., Zisserman, A.: Learning layered motion segmentation of video. Int. J. Comput. Vis. 76(3), 301–319 (2008) CrossRefGoogle Scholar
  16. 16.
    Kumar, M.P., Torr, P.H.S., Zisserman, A.: Efficient discriminative learning of parts-based models. In: Proc. ICCV (2009) Google Scholar
  17. 17.
    Lades, M., Vorbrüggen, J.C., Buhmann, J., Lange, J., von der Malsburg, C., Würtz, R.P., Konen, W.: Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput. 42(3), 300–311 (1993) CrossRefGoogle Scholar
  18. 18.
    Lee, Y.J., Grauman, K.: Shape discovery from unlabelled image collections. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2254–2261. IEEE Press, New York (2009) Google Scholar
  19. 19.
    Marcin, E., Vittorio, F.: Better appearance models for pictorial structures. In: Proc. BMVC, September 2009 Google Scholar
  20. 20.
    Montojo, J.: Face-based chromatic adaptation for tagged photo collections (2009) Google Scholar
  21. 21.
    Müller, M.K., Würtz, R.P.: Learning from examples to generalize over pose and illumination. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) Artificial Neural Networks—ICANN 2009. LNCS, vol. 5769, pp. 643–652. Springer, Berlin (2009) CrossRefGoogle Scholar
  22. 22.
    Niebles, J.C., Han, B., Ferencz, A., Fei-Fei, L.: Extracting moving people from Internet videos. In: Proc. ECCV, pp. 527–540. Springer, Berlin (2008) Google Scholar
  23. 23.
    Noriega, P., Bernier, O.: Multicues 2D articulated pose tracking using particle filtering and belief propagation on factor graphs. In: Proc. ICPR, pp. 57–60 (2007) Google Scholar
  24. 24.
    NVIDIA. NVIDIA CUDA Compute Unified Device Architecture—Programming Guide. NVIDIA (2007) Google Scholar
  25. 25.
    Poggio, T., Bizzi, E.: Generalization in vision and motor control. Nature 431, 768–774 (2004) CrossRefGoogle Scholar
  26. 26.
    Porikli, F.: Trajectory distance metric using hidden Markov model based representation. Technical report, Mitsubishi Electric Research Labs (2004) Google Scholar
  27. 27.
    Ross, D.A., Tarlow, D., Zemel, R.S.: Learning articulated structure and motion. Int. J. Comput. Vis. 88(2), 214–237 (2010) CrossRefGoogle Scholar
  28. 28.
    Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: Proc. Third Intl. Conf. on 3D Digital Imaging and Modelling, pp. 145–152 (2001) CrossRefGoogle Scholar
  29. 29.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (2000) CrossRefGoogle Scholar
  30. 30.
    Shotton, J., Blake, A., Cipolla, R.: Efficiently combining contour and texture cues for object recognition. In: British Machine Vision Conference (2008) Google Scholar
  31. 31.
    Sinha, S.N., Frahm, J.-M., Pollefeys, M., Genc, Y.: Gpu-based video feature tracking and matching. Technical report 06-012, Department of Computer Science, UNC Chapel Hill (2006) Google Scholar
  32. 32.
    Sminchisescu, C., Triggs, B.: Estimating articulated human motion with covariance scaled sampling. Int. J. Robot. Res. 22, 371–391 (2003) CrossRefGoogle Scholar
  33. 33.
    Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Report CMU-CS-91-132, Carnegie Mellon University (1991) Google Scholar
  34. 34.
    von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007) MathSciNetCrossRefGoogle Scholar
  35. 35.
    Walther, T., Würtz, R.P.: Learning to look at humans—what are the parts of a moving body. In: Perales, F.J., Fisher, R.B. (eds.) Proc. Fifth Conference on Articulated Motion and Deformable Objects. LNCS, vol. 5098, pp. 22–31. Springer, Berlin (2008) CrossRefGoogle Scholar
  36. 36.
    Walther, T., Würtz, R.P.: Unsupervised learning of human body parts from video footage. In: Proceedings of ICCV Workshops, Kyoto, pp. 336–343. IEEE Comput. Soc., Los Alamitos (2009) Google Scholar
  37. 37.
    Walther, T., Würtz, R.P.: Learning generic human body models. In: Perales, F., Fisher, R. (eds.) Proc. Sixth Conference on Articulated Motion and Deformable Objects. LNCS, vol. 6169, pp. 98–107. Springer, Berlin (2010) CrossRefGoogle Scholar
  38. 38.
    Wang, H., Culverhouse, P.F.: Robust motion segmentation by spectral clustering. In: Proc. British Machine Vision Conference, Norwich, UK, pp. 639–648 (2003) Google Scholar
  39. 39.
    Würtz, R.P. (ed.): Organic Computing. Springer, Berlin (2008) Google Scholar
  40. 40.
    Yan, J., Pollefeys, M.: Automatic kinematic chain building from feature trajectories of articulated objects. In: Proc. of CVPR, pp. 712–719 (2006) Google Scholar
  41. 41.
    Yan, J., Pollefeys, M.: A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 865–877 (2008) CrossRefGoogle Scholar
  42. 42.
    Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Advances in Neural Information Processing Systems. NIPS, vol. 17 (2004) Google Scholar

Copyright information

© Springer Basel AG 2011

Authors and Affiliations

  1. 1.Institut für NeuroinformatikRuhr-Universität BochumBochumGermany

Personalised recommendations