Skip to main content
Log in

A spatiotemporal multi-feature extraction framework with space and channel based squeeze-and-excitation blocks for human activity recognition

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript


Human activity recognition (HAR) is an active field in ubiquitous computing and body area network (BAN), which has been widely applied in medical care, sport and smart home. In recent years, a lot of methods based on deep learning show great performance on HAR. In consideration of the temporal and spatial dependencies of time series, the extracted features of traditional methods are not comprehensive. In this paper, we propose a new activity recognition framework based on spatiotemporal multi-feature extraction with space and channel based squeeze-and-excitation blocks (SCbSE-SMFE). The framework includes a temporal feature extraction layer composed of gated recurrent unit (GRU) blocks, a spatial feature extraction layer composed of convolutional neural networks (CNN) blocks with SCbSE blocks, a statistical feature extraction layer and an output layer. Meanwhile, regarding the actual needs for recognizing aggressive activities, we simulate the prison environment and collect an aggressive activity dataset (AAD). What’s more, aiming at the characteristics of aggressive activities, a threshold-based aggressive activity detection method is proposed to reduce the computational complexity. The proposed framework is evaluated on the public dataset WISDM and the collected dataset AAD, and the results prove that the proposed SCbSE-SMFE framework can effectively improve the accuracy and distinguish similar activities better. The proposed aggressive activity detection method based on threshold can simplify the model and improve the recognition speed while ensuring the recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  • Abidine MHB, Fergani B, Fleury A (2017) Integrating prior knowledge in weighted SVM for human activity recognition in smart home. In: Proceedings of International Conference on Smart Homes and Health Telematics, pp 233–239

  • Chen L, Zhang HW, Xiao J et al (2017) SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 5659–5667

  • Chen MJ, Li Y, Luo X et al (2018) A novel human activity recognition scheme for smart health using multilayer extreme learning machine. IEEE Internet Things J 6(2):1410–1418

    Article  Google Scholar 

  • Cho H, Yoon SM (2018) Divide and conquer-based 1D CNN human activity recognition using test data sharpening. Sensors 18(4):1055–1079

    Article  Google Scholar 

  • Cho K, Merrienboer BV, Gulcehre C et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Empirical Methods in Natural Language Process, pp 1724–1734

  • Deniz O, Serrano I, Bueno G, et al (2015) Fast violence detection in video. In: Proceedings of the 2014 9th International Conference on Computer Vision Theory and Applications, pp 478–485

  • Feng ZT, Mo LF, Li M (2015) A random forest-based ensemble method for activity recognition. In: Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp 5074–5077

  • Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471

    Article  Google Scholar 

  • Hu J, Shen L, Sun G (2018) Squeeze-and-Excitation networks. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  • Huang Q et al (2020) Development of artificial intelligence (AI) algorithms to avoid potential baby sleep hazards in smart buildings, ASCE Construction Research Congress (CRC)

  • Ignatov A (2018) Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl Soft Comput 62:915–922

    Article  Google Scholar 

  • Jiang WC, Yin ZZ (2015) Human activity recognition using wearable sensors by deep convolutional neural networks. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp 1307–1310

  • Kingma D, Ba J (2014) Adam: a method for stochastic optimization. Comput Sci. arXiv:1412.6980

  • Lara OD, Labrador MA (2013) A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutor 15(3):1192–1209

    Article  Google Scholar 

  • Lee SM, Yoon SM, Cho H (2017) Human activity recognition from accelerometer data using convolutional neural network. In: Proceedings of the 2017 IEEE International Conference on Big Data and Smart Computing, pp 131–134

  • Lockhart JW, Pulickal T, Weiss GM (2012) Applications of mobile activity recognition. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pp 1054–1058

  • Nematallah H, Rajan S, Cretu AM (2019) Logistic model tree for human activity recognition using smartphone-based inertial sensors. In: Proceedings of 2019 IEEE Sensors, pp 1–4

  • Nievas EB, Suarez OD, Garcia GB, et al (2011) Violence detection in video using computer vision techniques. In: Proceedings of the 14th International Conference on Computer Analysis of Images and Patterns, pp 332–339

  • Okeyo G, Chen LM, Wang H (2014) Combining ontological and temporal formalisms for composite activity modelling and recognition in smart homes. Fut Generat Comput Syst 39:29–43

    Article  Google Scholar 

  • Panwar M, Biswas D, Bajaj H et al (2019) Rehab-Net: deep learning framework for arm movement classification using wearable sensors for stroke rehabilitation. IEEE Trans Biomed Eng 66(11):3026–3037

    Article  Google Scholar 

  • Paul P, George T (2015) An effective approach for human activity recognition on smartphone. In: Proceedings of the 2015 IEEE International Conference on Engineering and Technology, pp 45–47

  • Qiao HH, Wang TY, Wang P et al (2018) A time-distributed spatiotemporal feature learning method for machine health monitoring with multi-sensor time series. Sensors 18(9):2932–2951

    Article  Google Scholar 

  • Qin Z, Hu LZ, Zhang N et al (2019) Learning aided user identification using smartphone sensors for smart homes. IEEE Internet Things J 6(5):7760–7772

    Article  Google Scholar 

  • Ravi D, Wong C, Lo B et al (2017) A deep learning approach to on-node sensor data analytics for mobile or wearable devices. IEEE J Biomed Health Inform 21(1):56–64

    Article  Google Scholar 

  • Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Subasi A, Dammas DH, Alghamdi RD et al (2018) Sensor based human activity recognition using Adaboost ensemble classifier. Proc Comput Sci 140:104–111

    Article  Google Scholar 

  • Subasi A, Khateeb K, Brahimi T et al (2020) Human activity recognition using machine learning methods in a smart healthcare environment. In: Innovation in Health Informatics. Elsevier, Amsterdam, pp 123–144

  • Sun ZJ, Xue L, Xu YM et al (2012) Overview of deep learning. Appl Res Comput 29(8):2806–2810

    Google Scholar 

  • Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4. Inception-ResNet and the impact of residual connections on learning. arXiv: 1602.07261

  • TensorFlow (2020).

  • Vaughn A, Biocco P, Liu Y et al (2018) Activity detection and analysis using smartphone sensors. In: Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration for Data Science, pp 102–107

  • Wang JD, Chen YQ, Hao SJ et al (2019) Deep learning for sensor-based activity recognition: a survey. Pattern Recognit Lett 119:3–11

    Article  Google Scholar 

  • Wang JD, Chen YQ, Hu LS et al (2017) Stratified transfer learning for cross-domain activity recognition. In: Proceedings of the 2018 IEEE International Conference on Pervasive Computing and Communications, pp 1–10

  • Wang LK, Liu RY (2019) Human activity recognition based on wearable sensor using hierarchical deep LSTM networks. Circ Syst Signal Process 39(1):837–856

    Google Scholar 

  • Xi R, Li M, Hou MS et al (2018) Deep dilation on multimodality time series for human activity recognition. IEEE Access 6:53381–53396

    Article  Google Scholar 

  • Xia K, Huang JG, Wang HY (2020) LSTM-CNN architecture for human activity recognition. IEEE Access 8:56855–56866

    Article  Google Scholar 

  • Xu C, Chai D, He J et al (2019) InnoHAR: a deep neural network for complex human activity recognition. IEEE Access 7:9893–9902

    Article  Google Scholar 

  • Yin BC, Wang WT, Wang LC (2015) Review of deep learning. J Beijing Univ Technol 41(1):48–59

    MATH  Google Scholar 

  • Zhang HX, Xiao ZW, Wang J et al (2019) A novel IoT-perceptive human activity recognition (HAR) approach using multihead convolutional attention. IEEE Internet Things J 7(2):1072–1080

    Article  Google Scholar 

  • Zhao Y, Yang RN, Chevalier G et al (2018) Deep residual Bidir-LSTM for human activity recognition using wearable sensors. Math Prob Eng 9:1–13

    Google Scholar 

  • Zheng JW, Lu C, Hao C et al (2020) Improving the generalization ability of deep neural networks for cross-domain visual recognition. IEEE Trans Cognit Dev Syst 2020:1–15

    Google Scholar 

Download references


This work was financially supported by the National Key Research and Development Program of China (2017YFC0803403, 2018YFC0831001), the National Natural Science Foundation of China (61771292, 61401253), and the Natural Science Foundation of Shandong Province of China (ZR2016FM29), the Key Research and Development Program of Shandong Province of China (2017GGX201003).

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Hongji Xu or Hailiang Xiong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, B., Xu, H., Xiong, H. et al. A spatiotemporal multi-feature extraction framework with space and channel based squeeze-and-excitation blocks for human activity recognition. J Ambient Intell Human Comput 12, 7983–7995 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: