Advertisement

DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition

Conference paper
  • 620 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12365)

Abstract

We propose a Dynamic Directed Graph Convolutional Network (DDGCN) to model spatial and temporal features of human actions from their skeletal representations. The DDGCN consists of three new feature modeling modules: (1) Dynamic Convolutional Sampling (DCS), (2) Dynamic Convolutional Weight (DCW) assignment, and (3) Directed Graph Spatial-Temporal (DGST) feature extraction. Comprehensive experiments show that the DDGCN outperforms existing state-of-the-art action recognition approaches in various testing datasets.

Keywords

Action modeling and recognition Graph Convolutional Network Dynamic Spatiotemporal Graph 

References

  1. 1.
    Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD workshop, Seattle, WA, vol. 10, pp. 359–370 (1994)Google Scholar
  2. 2.
    Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., Gould, S.: Dynamic image networks for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3034–3042 (2016)Google Scholar
  3. 3.
    Cao, J., Tagliasacchi, A., Olson, M., Zhang, H., Su, Z.: Point cloud skeletons via laplacian based contraction. In: Shape Modeling International (SMI 2010), pp. 187–197. IEEE (2010)Google Scholar
  4. 4.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)Google Scholar
  5. 5.
    Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)Google Scholar
  6. 6.
    Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)CrossRefGoogle Scholar
  7. 7.
    Iwana, B.K., Uchida, S.: Dynamic weight alignment for temporal convolutional neural networks. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3827–3831. IEEE (2019)Google Scholar
  8. 8.
    Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., Feng, D.D.: Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Trans. Syst. Man Cybern. Syst. 49(9), 1806–1816 (2018)CrossRefGoogle Scholar
  9. 9.
    Kar, A., Rai, N., Sikka, K., Sharma, G.: AdaScan: adaptive scan pooling in deep convolutional neural networks for human action recognition in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3376–3385 (2017)Google Scholar
  10. 10.
    Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
  11. 11.
    Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F.: A new representation of skeleton sequences for 3D action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4570–4579 (2017)Google Scholar
  12. 12.
    Kim, T.S., Reiter, A.: Interpretable 3D human action analysis with temporal convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1623–1631. IEEE (2017)Google Scholar
  13. 13.
    Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)Google Scholar
  14. 14.
    Mandal, D., et al.: Out-of-distribution detection for generalized zero-shot action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9985–9993 (2019)Google Scholar
  15. 15.
    Peng, X., Zou, C., Qiao, Yu., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_38CrossRefGoogle Scholar
  16. 16.
    Piergiovanni, A., Ryoo, M.S.: Representation flow for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9945–9953 (2019)Google Scholar
  17. 17.
    Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)Google Scholar
  18. 18.
    Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7912–7921 (2019)Google Scholar
  19. 19.
    Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)Google Scholar
  20. 20.
    Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)Google Scholar
  21. 21.
    Si, C., Jing, Y., Wang, W., Wang, L., Tan, T.: Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 106–121. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01246-5_7CrossRefGoogle Scholar
  22. 22.
    Sun, S., Kuang, Z., Sheng, L., Ouyang, W., Zhang, W.: Optical flow guided feature: a fast and robust motion representation for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1390–1399 (2018)Google Scholar
  23. 23.
    Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)Google Scholar
  24. 24.
    Tran, D.V., Navarin, N., Sperduti, A.: On filter size in graph convolutional networks. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1534–1541. IEEE (2018)Google Scholar
  25. 25.
    Wang, D., Yuan, Y., Wang, Q.: Early action prediction with generative adversarial networks. IEEE Access 7, 35795–35804 (2019)CrossRefGoogle Scholar
  26. 26.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)Google Scholar
  27. 27.
    Wang, J., Jiao, J., Bao, L., He, S., Liu, Y., Liu, W.: Self-supervised spatio-temporal representation learning for videos by predicting motion and appearance statistics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4006–4015 (2019)Google Scholar
  28. 28.
    Wang, L., Gao, C., Yang, L., Zhao, Y., Zuo, W., Meng, D.: PM-GANs: discriminative representation learning for action recognition using partial-modalities. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 389–406. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_24CrossRefGoogle Scholar
  29. 29.
    Wu, D., Chen, J., Sharma, N., Pan, S., Long, G., Blumenstein, M.: Adversarial action data augmentation for similar gesture action recognition. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)Google Scholar
  30. 30.
    Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  31. 31.
    Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: 2011 International Conference on Computer Vision, pp. 1331–1338. IEEE (2011)Google Scholar
  32. 32.
    Zadghorban, M., Nahvi, M.: An algorithm on sign words extraction and recognition of continuous persian sign language based on motion and shape features of hands. Pattern Anal. Appl. 21(2), 323–335 (2018)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Zhang, C., Tian, Y., Guo, X., Liu, J.: DAAL: deep activation-based attribute learning for action recognition in depth videos. Comput. Vis. Image Underst. 167, 37–49 (2018)CrossRefGoogle Scholar
  34. 34.
    Zhang, H.B., et al.: A comprehensive survey of vision-based human action recognition methods. Sensors 19(5), 1005 (2019)CrossRefGoogle Scholar
  35. 35.
    Zhao, R., Wang, K., Su, H., Ji, Q.: Bayesian graph convolution LSTM for skeleton based action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6882–6892 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Louisiana State UniversityBaton RougeUSA

Personalised recommendations