uulmMAD – A Human Action Recognition Dataset for Ground-Truth Evaluation and Investigation of View Invariances

  • Michael GlodekEmail author
  • Georg Layher
  • Felix Heilemann
  • Florian Gawrilowicz
  • Günther Palm
  • Friedhelm Schwenker
  • Heiko Neumann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8869)


In recent time, human action recognition has gained increasing attention in pattern recognition. However, many datasets in the literature focus on a limited number of target-oriented properties. Within this work, we present a novel dataset, named uulmMAD, which has been created to benchmark state-of-the-art action recognition architectures addressing multiple properties, e.g. high-resolutions cameras, perspective changes, realistic cluttered background and noise, overlap of action classes, different execution speeds, variability in subjects and their clothing, and the availability of a pose ground-truth. The uulmMAD was recorded using three synchronized high-resolution cameras and an inertial motion capturing system. Each subject performed fourteen actions at least three times in front of a green screen. Selected actions in four variants were recorded, i.e. normal, pausing, fast and deceleration. The data has been post-processed in order to separate the subject from the background. Furthermore, the camera and the motion capturing data have been mapped onto each other and 3D-avatars have been generated to further extend the dataset. The avatars have also been used to emulate the self-occlusion in pose recognition when using a time-of-flight camera. In this work, we analyze the uulmMAD using a state-of-the-art action recognition architecture to provide first baseline results. The results emphasize the unique characteristics of the dataset. The dataset will be made publicity available upon publication of the paper.


Action Recognition Human Action Recognition Camera Perspective Human Avatar Skeleton Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This paper is based on work done within the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation (DFG).


  1. 1.
    Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)CrossRefGoogle Scholar
  2. 2.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision 2005, ICCV 2005, vol. 2, pp. 1395–1402. IEEE (2005)Google Scholar
  3. 3.
    Escobar, M.J., Masson, G.S., Vieville, T., Kornprobst, P.: Action recognition using a bio-inspired feedforward spiking network. Int. J. Comput. Vis. 82(3), 284–301 (2009)CrossRefGoogle Scholar
  4. 4.
    Glodek, M., Geier, T., Biundo, S., Palm, G.: A layered architecture for probabilistic complex pattern recognition to detect user preferences. J. Biol. Inspired Cogn. Archit. 9, 46–56 (2014)Google Scholar
  5. 5.
    Glodek, M., Geier, T., Biundo, S., Schwenker, F., Palm, G.: Recognizing user preferences based on layered activity recognition and first-order logic. In: Proceedings of the International IEEE Conference on Tools with Artificial Intelligence (ICTAI), pp. 648–653. IEEE (2013)Google Scholar
  6. 6.
    Glodek, M., Reuter, S., Schels, M., Dietmayer, K., Schwenker, F.: Kalman filter based classifier fusion for affective state recognition. In: Zhou, Z.-H., Roli, F., Kittler, J. (eds.) MCS 2013. LNCS, vol. 7872, pp. 85–94. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  7. 7.
    Glodek, M., Schels, M., Schwenker, F., Palm, G.: Combination of sequential class distributions from multiple channels using Markov fusion networks. J. Multimodal User Interfaces 8(3), 257–272 (2014)CrossRefGoogle Scholar
  8. 8.
    Glodek, M., Trentin, E., Schwenker, F., Palm, G.: Hidden Markov models with graph densities for action recognition. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 964–969. IEEE (2013)Google Scholar
  9. 9.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the Alvey Vision Conference, pp. 147–151 (1988)Google Scholar
  10. 10.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)Google Scholar
  11. 11.
    Hassner, T.: A critical review of action recognition benchmarks. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 245–250. IEEE Computer Society (2013)Google Scholar
  12. 12.
    Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4660–4665. IEEE (2014)Google Scholar
  13. 13.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition 2008, CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  15. 15.
    Layher, G., Giese, M.A., Neumann, H.: Learning representations of animated motion sequences - a neural model. Top. Cogn. Sci. 6(1), 170–182 (2014)CrossRefGoogle Scholar
  16. 16.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition 2009, CVPR 2009, pp. 1996–2003. IEEE (2009)Google Scholar
  17. 17.
    Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: IEEE Conference on Computer Vision and Pattern Recognition 2007, CVPR’07, pp. 1–8. IEEE (2007)Google Scholar
  18. 18.
    Mishima, Y.: A software chromakeyer using polyhedric slice. In: Proceedings of NICOGRAPH, vol. 92, pp. 44–52 (1992)Google Scholar
  19. 19.
    Mishima, Y.: Soft edge chroma-key generation based upon hexoctahedral color space. U.S. Patent and Trademark Office, US Patent 5355174 A, Oct 1994Google Scholar
  20. 20.
    Patron, A., Marszalek, M., Zisserman, A., Reid, I.: High five: recognising human interactions in TV shows. In: Proceedings of the British Machine Vision Conference, pp. 50.1–50.11. BMVA Press (2010). doi: 10.5244/C.24.50
  21. 21.
    Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)CrossRefGoogle Scholar
  22. 22.
    Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley, Reading (1993)Google Scholar
  23. 23.
    Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)CrossRefGoogle Scholar
  24. 24.
    Roetenberg, D., Luinge, H., Slycke, P.: Xsens MVN: full 6DOF human motion tracking using miniature inertial sensors. Technical report, Xsens Technologies B. V. (2009)Google Scholar
  25. 25.
    Scherer, S., Glodek, M., Schwenker, F., Campbell, N., Palm, G.: Spotting laughter in natural multiparty conversations a comparison of automatic online and offline approaches using audiovisual data. ACM Trans. Interact. Intell. Syst. (TiiS) - Special Issue on Affective Interaction in Natural Environments 2(1), 4:1–4:31 (2012)Google Scholar
  26. 26.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition 2004, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)Google Scholar
  27. 27.
    Smith, A.R., Blinn, J.F.: Blue screen matting. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 259–268. ACM (1996)Google Scholar
  28. 28.
    Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 548–561. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Michael Glodek
    • 1
    Email author
  • Georg Layher
    • 1
  • Felix Heilemann
    • 1
  • Florian Gawrilowicz
    • 1
  • Günther Palm
    • 1
  • Friedhelm Schwenker
    • 1
  • Heiko Neumann
    • 1
  1. 1.Institute of Neural Information ProcessingUniversity of UlmUlmGermany

Personalised recommendations