A Deep Model Combining Structural Features and Context Cues for Action Recognition in Static Images

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10639)


In this paper, we present a deep model for the task of action recognition in static images, which combines body structural information and context cues to build a more accurate classifier. Moreover, to construct more semantic and robust body structural features, we propose a new body descriptor, named limb angle discriptor(LAD), which uses the relative angles between the limbs in 2D skeleton. We evaluate our method on the PASCAL VOC 2012 Action dataset and compare it with the published results. The result shows that our method achieves 90.6% mean AP, outperforming the previous state-of-art approaches in the field.


Action recognition Deep model Body descriptor Context cue 



The work is partly supported by Beijing Natual Science Foundation (4172054).


  1. 1.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations, pp. 1365–1372 (2010)Google Scholar
  2. 2.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050 (2016)
  3. 3.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
  4. 4.
    Ellis, C., Masood, S.Z., Tappen, M.F., Laviola, J.J., Sukthankar, R.: Exploring the trade-off between accuracy and observational latency in action recognition. Int. J. Comput. Vis. 101(3), 420–436 (2013)CrossRefGoogle Scholar
  5. 5.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  6. 6.
    Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  7. 7.
    Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. arXiv preprint arXiv:1704.07333 (2017)
  8. 8.
    Gkioxari, G., Girshick, R., Malik, J.: Actions and attributes from wholes and parts, pp. 2470–2478 (2015)Google Scholar
  9. 9.
    Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with r*cnn, pp. 1080–1088 (2015)Google Scholar
  10. 10.
    Hoai, M., Ladicky, L., Zisserman, A.: Action recognition from weak alignment of body parts (2014)Google Scholar
  11. 11.
    Hoai12, M.: Regularized max pooling for image categorization (2014)Google Scholar
  12. 12.
    Hussein, M.E., Torki, M., Gowayyed, M.A., Elsaban, M.: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, pp. 2466–2472 (2013)Google Scholar
  13. 13.
    Kerola, T., Inoue, N., Shinoda, K.: Spectral graph skeletons for 3D action recognition, pp. 417–432 (2014)Google Scholar
  14. 14.
    Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points, pp. 9–14 (2010)Google Scholar
  15. 15.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  16. 16.
    Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance, pp. 3177–3184 (2011)Google Scholar
  17. 17.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1717–1724 (2014)Google Scholar
  18. 18.
    Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 601–614 (2012)CrossRefGoogle Scholar
  19. 19.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  20. 20.
    Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRefGoogle Scholar
  21. 21.
    Yang, X., Tian, Y.L.: Eigenjoints-based action recognition using nave-bayes-nearest-neighbor, pp. 14–19 (2012)Google Scholar
  22. 22.
    Ziaeefard, M., Bergevin, R.: Semantic human activity recognition: A literature review. Pattern Recogn. 48(8), 2329–2345 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Beijing Institute of TechnologyBeijingChina

Personalised recommendations