Deep Neural Networks for Human Behavior Understanding

  • Rajiv Singh
  • Swati Nigam


Human behavior understanding techniques are proposed for several applications likewise object recognition, face detection, emotion detection, action detection, finger print identification, gait recognition, voice recognition, etc. Emotion and action recognition are the most popular applications among them. This chapter presents an analysis of recently developed deep learning techniques for emotion and activity recognition. Existing approaches are discussed that use deep learning as their core component. Experimental results are reported on benchmark datasets i.e. CK+ and SFEW datasets for emotion recognition, and Skoda and UCF 101 datasets for activity recognition. Experimentation shows that deep learning methods outperform other existing techniques in literature and demonstrate great performance.


Behavior recognition Facial expression recognition Activity recognition Deep learning 



This study is sponsored by Science and Engineering Research Board, Department of Science and Technology, Government of India via grant no. PDF/2016/003644.


  1. 1.
    Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited.Google Scholar
  2. 2.
    Sonka, M., Hlavac, V., & Boyle, R. (2014). Image processing, analysis, and machine vision. Cengage Learning.Google Scholar
  3. 3.
    Nigam, S., Singh, R., & Misra, A. K. (2019). Towards intelligent human behavior detection for video surveillance. In Censorship, Surveillance, and Privacy: Concepts, Methodologies, Tools, and Applications (pp. 884-917). IGI Global.Google Scholar
  4. 4.
    Nigam, S., Singh, R., & Misra, A. K. (2018). A Review of Computational Approaches for Human Behavior Detection. Archives of Computational Methods in Engineering, 1-33.
  5. 5.
    Zhao, K., Chu, W. S., De la Torre, F., Cohn, J. F., & Zhang, H. (2016). Joint patch and multi-label learning for facial action unit and holistic expression recognition. IEEE Transactions on Image Processing, 25(8), 3931-3946.Google Scholar
  6. 6.
    Nigam, S., Singh, R., & Misra, A. K. (2018). Efficient facial expression recognition using histogram of oriented gradients in wavelet domain. Multimedia Tools and Applications, 1-23.Google Scholar
  7. 7.
    Emambakhsh, M., & Evans, A. (2017). Nasal patches and curves for expression-robust 3D face recognition. IEEE transactions on pattern analysis and machine intelligence, 39(5), 995-1007.Google Scholar
  8. 8.
    Nigam, S., Singh, R., & Misra, A. K. (2018). Local Binary Patterns based Facial Expression Recognition for Efficient Smart Applications, Machine Learning Paradigms: Theory and Applications, Security in Smart Cities, Studies in Computational Intelligence Series, Springer.Google Scholar
  9. 9.
    Kerola, T., Inoue, N., & Shinoda, K. (2017). Cross-view human action recognition from depth maps using spectral graph sequences. Computer Vision and Image Understanding, 154, 108-126.Google Scholar
  10. 10.
    Nigam, S., & Khare, A. (2016). Integration of moment invariants and uniform local binary patterns for human activity recognition in video sequences. Multimedia Tools and Applications, 75(24), 17303-17332.Google Scholar
  11. 11.
    Sharma, C. M., Kushwaha, A. K. S., Nigam, S., & Khare, A. (2011, September). On human activity recognition in video sequences. In Computer and Communication Technology (ICCCT), 2011 2nd International Conference on (pp. 152-158). IEEE.Google Scholar
  12. 12.
    Salah, A. A., Gevers, T., Sebe, N., & Vinciarelli, A. (2010, August). Challenges of human behavior understanding. In International Workshop on Human Behavior Understanding (pp. 1-12). Springer, Berlin, Heidelberg.Google Scholar
  13. 13.
    Kamnitsas, K., Ledig, C., Newcombe, V. F., Simpson, J. P., Kane, A. D., Menon, D. K., … & Glocker, B. (2017). Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis, 36, 61-78.Google Scholar
  14. 14.
    Hoo-Chang, S., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., … & Summers, R. M. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging, 35(5), 1285.Google Scholar
  15. 15.
    Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3), 55-75.Google Scholar
  16. 16.
    Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., & Xu, W. (2016). Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2285-2294).Google Scholar
  17. 17.
    Zheng, Z., Zheng, L., & Yang, Y. (2017). A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 14(1), 13.Google Scholar
  18. 18.
    Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2016, July). Writer-independent feature learning for offline signature verification using deep convolutional neural networks. In Neural networks (IJCNN), 2016 international joint conference on (pp. 2576-2583). IEEE.Google Scholar
  19. 19.
    Leal-Taixé, L., Canton-Ferrer, C., & Schindler, K. (2016). Learning by tracking: Siamese cnn for robust target association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 33-40).Google Scholar
  20. 20.
    Shima, Y., & Omori, Y. (2018, August). Image Augmentation for Classifying Facial Expression Images by Using Deep Neural Network Pre-trained with Object Image Database. In Proceedings of the 3rd International Conference on Robotics, Control and Automation (pp. 140-146). ACM.Google Scholar
  21. 21.
    Ronao, C. A., & Cho, S. B. (2016). Human activity recognition with smartphone sensors using deep learning neural networks. Expert Systems with Applications, 59, 235-244.Google Scholar
  22. 22.
    Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1). Cambridge: MIT press.Google Scholar
  23. 23.
    Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., … & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical image analysis, 42, 60-88.Google Scholar
  24. 24.
    Zeng, Z., Li, Z., Cheng, D., Zhang, H., Zhan, K., & Yang, Y. (2018). Two-Stream Multirate Recurrent Neural Network for Video-Based Pedestrian Reidentification. IEEE Transactions on Industrial Informatics, 14(7), 3179-3186.Google Scholar
  25. 25.
    Aldwairi, T., Perera, D., & Novotny, M. A. (2018). An evaluation of the performance of Restricted Boltzmann Machines as a model for anomaly network intrusion detection. Computer Networks, 144, 111-119.Google Scholar
  26. 26.
    Sankaran, A., Vatsa, M., Singh, R., & Majumdar, A. (2017). Group sparse autoencoder. Image and Vision Computing, 60, 64-74.Google Scholar
  27. 27.
    Dailey, M. N., Joyce, C., Lyons, M. J., Kamachi, M., Ishi, H., Gyoba, J., & Cottrell, G. W. (2010). Evidence and a computational explanation of cultural differences in facial expression recognition. Emotion, 10(6), 874.Google Scholar
  28. 28.
    Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression analysis. In Automatic Face and Gesture Recognition. Proceedings. Fourth IEEE International Conference on (pp. 46-53). IEEE.Google Scholar
  29. 29.
    Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010, June). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 94-101). IEEE.Google Scholar
  30. 30.
    Yale facial expression database,
  31. 31.
    Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005, July). Web-based database for facial expression analysis. In 2005 IEEE international conference on multimedia and Expo (p. 5). IEEE.Google Scholar
  32. 32.
    Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014, November). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision (pp. 143-157). Springer, Cham.Google Scholar
  33. 33.
    Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2983-2991).Google Scholar
  34. 34.
    Jung, H., Lee, S., Park, S., Kim, B., Kim, J., Lee, I., & Ahn, C. (2015, January). Development of deep learning-based facial expression recognition system. In Frontiers of Computer Vision (FCV), 2015 21st Korea-Japan Joint Workshop on (pp. 1-4). IEEE.Google Scholar
  35. 35.
    Spiers, D. L. (2016). Facial emotion detection using deep learning. Doctoral Dissertation, UPPSALA Universitet.Google Scholar
  36. 36.
    Meng, Z., Liu, P., Cai, J., Han, S., & Tong, Y. (2017, May). Identity-aware convolutional neural network for facial expression recognition. In Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on (pp. 558-565). IEEE.Google Scholar
  37. 37.
    Liu, M., Li, S., Shan, S., & Chen, X. (2015). Au-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126-136.Google Scholar
  38. 38.
    Liu, P., Han, S., Meng, Z., & Tong, Y. (2014). Facial expression recognition via a boosted deep belief network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1805-1812).Google Scholar
  39. 39.
    Fathallah, A., Abdi, L., & Douik, A. (2017, October). Facial Expression Recognition via Deep Learning. In Computer Systems and Applications (AICCSA), 2017 IEEE/ACS 14th International Conference on (pp. 745-750). IEEE.Google Scholar
  40. 40.
    Li, W., Li, M., Su, Z., & Zhu, Z. (2015, May). A deep-learning approach to facial expression recognition with candid images. In Machine Vision Applications (MVA), 2015 14th IAPR International Conference on (pp. 279-282). IEEE.Google Scholar
  41. 41.
    Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011, November). Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on (pp. 2106-2112). IEEE.Google Scholar
  42. 42.
    Levi, G., & Hassner, T. (2015, November). Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 503-510). ACM.Google Scholar
  43. 43.
    Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015, November). Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 443-449). ACM.Google Scholar
  44. 44.
    Li, S., & Deng, W. (2018). Reliable Crowdsourcing and Deep Locality-Preserving Learning for Unconstrained Facial Expression Recognition. IEEE Transactions on Image Processing.Google Scholar
  45. 45.
    Ding, H., Zhou, S. K., & Chellappa, R. (2017, May). Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on (pp. 118-126). IEEE.Google Scholar
  46. 46.
    Pons, G., & Masip, D. (2018). Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition. arXiv preprint arXiv:1802.06664.Google Scholar
  47. 47.
    Liu, X., Kumar, B. V., You, J., & Jia, P. (2017, July). Adaptive Deep Metric Learning for Identity-Aware Facial Expression Recognition. In CVPR Workshops (pp. 522-531).Google Scholar
  48. 48.
    Cai, J., Meng, Z., Khan, A. S., Li, Z., O’Reilly, J., & Tong, Y. (2018, May). Island Loss for Learning Discriminative Features in Facial Expression Recognition. In Automatic Face & Gesture Recognition (FG 2018), 2018 13th IEEE International Conference on (pp. 302-309). IEEE.Google Scholar
  49. 49.
    Kim, B. K., Lee, H., Roh, J., & Lee, S. Y. (2015, November). Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 427-434). ACM.Google Scholar
  50. 50.
    Yu, Z., & Zhang, C. (2015, November). Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 435-442). ACM.Google Scholar
  51. 51.
    Roggen, D., Calatroni, A., Rossi, M., Holleczek, T., Förster, K., Tröster, G., … & Doppler, J. (2010, June). Collecting complex activity datasets in highly rich networked sensor environments. In Networked Sensing Systems (INSS), 2010 Seventh International Conference on (pp. 233-240). IEEE.Google Scholar
  52. 52.
    Reiss, A., & Stricker, D. (2012, June). Introducing a new benchmarked dataset for activity monitoring. In Wearable Computers (ISWC), 2012 16th International Symposium on (pp. 108-109). IEEE.Google Scholar
  53. 53.
    Zappi, P., Lombriser, C., Stiefmeier, T., Farella, E., Roggen, D., Benini, L., & Tröster, G. (2008). Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection. In Wireless sensor networks (pp. 17-33). Springer, Berlin, Heidelberg.Google Scholar
  54. 54.
    Banos, O., Garcia, R., Holgado-Terriza, J. A., Damas, M., Pomares, H., Rojas, I., … & Villalonga, C. (2014, December). mHealthDroid: a novel framework for agile development of mobile health applications. In International Workshop on Ambient Assisted Living (pp. 91-98). Springer, Cham.Google Scholar
  55. 55.
    Zeng, M., Nguyen, L. T., Yu, B., Mengshoel, O. J., Zhu, J., Wu, P., & Zhang, J. (2014, November). Convolutional neural networks for human activity recognition using mobile sensors. In Mobile Computing, Applications and Services (MobiCASE), 2014 6th International Conference on (pp. 197-205). IEEE.Google Scholar
  56. 56.
    Alsheikh, M. A., Selim, A., Niyato, D., Doyle, L., Lin, S., & Tan, H. P. (2016, February). Deep Activity Recognition Models with Triaxial Accelerometers. In AAAI Workshop: Artificial Intelligence Applied to Assistive Technologies and Smart Environments.Google Scholar
  57. 57.
    Ordóñez, F. J., & Roggen, D. (2016). Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors, 16(1), 115.Google Scholar
  58. 58.
    Mohammad, Y., Matsumoto, K., & Hoashi, K. (2018). Primitive activity recognition from short sequences of sensory data. Applied Intelligence, 1-14.Google Scholar
  59. 59.
    Hossain, H. M., Al Haiz Khan, M. D., & Roy, N. (2018). DeActive: Scaling Activity Recognition with Active Deep Learning. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(2), 66.Google Scholar
  60. 60.
    Qian, H., Pan, S. J., & Miao, C. (2018). Sensor-based Activity Recognition via Learning from Distributions. The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 6262-6269.Google Scholar
  61. 61.
    Hammerla, N. Y., Halloran, S., & Ploetz, T. (2016). Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880. In Proc. IJCAI.Google Scholar
  62. 62.
    Murahari, V. S., & Ploetz, T. (2018). On Attention Models for Human Activity Recognition. arXiv preprint arXiv:1805.07648.
  63. 63.
    Ravi, D., Wong, C., Lo, B., & Yang, G. Z. (2016, June). Deep learning for human activity recognition: A resource efficient implementation on low-power devices. In Wearable and Implantable Body Sensor Networks (BSN), 2016 IEEE 13th International Conference on (pp. 71-76). IEEE.Google Scholar
  64. 64.
    Murad, A., & Pyun, J. Y. (2017). Deep recurrent neural networks for human activity recognition. Sensors, 17(11), 2556, doi: Scholar
  65. 65.
    Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, CRCV-TR-12-01.Google Scholar
  66. 66.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 4489-4497).Google Scholar
  67. 67.
    Sun, L., Jia, K., Yeung, D. Y., & Shi, B. E. (2015). Human action recognition using factorized spatio-temporal convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4597-4605).Google Scholar
  68. 68.
    Varol, G., Laptev, I., & Schmid, C. (2018). Long-term temporal convolutions for action recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6), 1510-1517.Google Scholar
  69. 69.
    Wang, L., Qiao, Y., & Tang, X. (2015). Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4305-4314).Google Scholar
  70. 70.
    Feichtenhofer, C., Pinz, A., & Zisserman, A. (2016). Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1933-1941).Google Scholar
  71. 71.
    Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016, October). Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision (pp. 20-36). Springer, Cham.Google Scholar
  72. 72.
    Feichtenhofer, C., Pinz, A., & Wildes, R. (2016). Spatiotemporal residual networks for video action recognition. In Advances in neural information processing systems (pp. 3468-3476).Google Scholar
  73. 73.
    Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., & Gould, S. (2016). Dynamic image networks for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3034-3042).Google Scholar
  74. 74.
    Srivastava, N., Mansimov, E., & Salakhudinov, R. (2015, June). Unsupervised learning of video representations using lstms. In International conference on machine learning (pp. 843-852).Google Scholar
  75. 75.
    Lev, G., Sadeh, G., Klein, B., & Wolf, L. (2016, October). Rnn fisher vectors for action recognition and image annotation. In European Conference on Computer Vision (pp. 833-850). Springer, Cham.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Rajiv Singh
    • 1
  • Swati Nigam
    • 2
  1. 1.Department of Computer ScienceBanasthali VidyapithBanasthaliIndia
  2. 2.Computer Science and Engineering DepartmentS. P. Memorial Institute of TechnologyKaushambiIndia

Personalised recommendations