Indoor human activity recognition using high-dimensional sensors and deep neural networks

Vandersmissen, Baptist; Knudde, Nicolas; Jalalvand, Azarakhsh; Couckuyt, Ivo; Dhaene, Tom; De Neve, Wesley

doi:10.1007/s00521-019-04408-1

Indoor human activity recognition using high-dimensional sensors and deep neural networks

Engineering Applications of Neural Networks 2018
Published: 09 August 2019

Volume 32, pages 12295–12309, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Baptist Vandersmissen ORCID: orcid.org/0000-0002-5203-4531¹,
Nicolas Knudde²,
Azarakhsh Jalalvand¹,
Ivo Couckuyt²,
Tom Dhaene² &
…
Wesley De Neve^1,3

1066 Accesses
27 Citations
2 Altmetric
Explore all metrics

Abstract

Many smart home applications rely on indoor human activity recognition. This challenge is currently primarily tackled by employing video camera sensors. However, the use of such sensors is characterized by fundamental technical deficiencies in an indoor environment, often also resulting in a breach of privacy. In contrast, a radar sensor resolves most of these flaws and maintains privacy in particular. In this paper, we investigate a novel approach toward automatic indoor human activity recognition, feeding high-dimensional radar and video camera sensor data into several deep neural networks. Furthermore, we explore the efficacy of sensor fusion to provide a solution in less than ideal circumstances. We validate our approach on two newly constructed and published data sets that consist of 2347 and 1505 samples distributed over six different types of gestures and events, respectively. From our analysis, we can conclude that, when considering a radar sensor, it is optimal to make use of a three-dimensional convolutional neural network that takes as input sequential range-Doppler maps. This model achieves 12.22% and 2.97% error rate on the gestures and the events data set, respectively. A pretrained residual network is employed to deal with the video camera sensor data and obtains 1.67% and 3.00% error rate on the same data sets. We show that there exists a clear benefit in combining both sensors to enable activity recognition in the case of less than ideal circumstances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Patient activity recognition using radar sensors and machine learning

Article 03 May 2022

Deep Learning Algorithms for Human Activity Recognition: A Comparative Analysis

Elderly Care - Human Activity Recognition Using Radar with an Open Dataset and Hybrid Maps

Notes

http://crcv.ucf.edu/data/UCF101.php.
https://20bn.com/datasets/jester.
From a strict point-of-view, we are dealing with a cross-correlation, as the kernel is not flipped.
The data sets are publicly available at: https://www.imec-int.com/en/harrad.
https://pytorch.org.

References

Bengio Y, Goodfellow IJ, Courville A (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Brooker GM (2005) Understanding millimetre wave FMCW radars. In: 1st international conference on sensing technology. pp 152–157
Chen Q, Tan B, Chetty K, Woodbridge K (2016) Activity recognition based on micro-doppler signature with in-home Wi-Fi. In: IEEE 18th international conference on e-health networking, applications and services (Healthcom). pp 1–6
Chen VC, Li F, Ho SS, Wechsler H (2006) Micro-Doppler effect in radar: phenomenon, model, and simulation study. IEEE Trans Aerosp Electron Syst 42(1):2–21
Article Google Scholar
Cho H, Seo Y, Kumar BVKV, Rajkumar RR (2014) A multi-sensor fusion system for moving object detection and tracking in urban driving environments. In: IEEE international conference on robotics and automation (ICRA). pp 1836–1843
Djork-Arné C, Thomas U, Sepp H (2015) Fast and accurate deep network learning by exponential linear units (ELUs). CoRR arXiv:abs/1511.07289
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691
Article Google Scholar
Eshed OB, Ashish T, Sujitha M, Trivedi Mohan M (2015) On surveillance for safety critical events: in-vehicle video networks for predictive driver assistance systems. Comput Vis Image Underst 134:130–140
Article Google Scholar
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR)
Fioranelli F, Ritchie M, Griffiths H (2015) Classification of unarmed/armed personnel using the NetRad multistatic radar for micro-doppler and singular value decomposition features. IEEE Geosci Remote Sens Lett 12(9):1933–1937
Article Google Scholar
Gurbuz SZ, Clemente C, Balleri A, Soraghan JJ (2017) Micro-Doppler-based in-home aided and unaided walking recognition with multiple radar and sonar systems. IET Radar Sonar Navig 11(1):107–115
Article Google Scholar
Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d CNNs retrace the history of 2d CNNs and imagenet? In: IEEE/CVF conference on computer vision and pattern recognition. pp 6546–6555
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 770–778
Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition. Image Vis Comput 60:4–21
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
INRAS GmbH (2017) http://www.inras.at. Accessed 20 Jun 2017
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, Washington, pp 1725–1732
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev A, Suleyman M, Zisserman A (2017) The kinetics human action video dataset. CoRR arXiv:abs/1705.06950
Kim Y, Toomajian B (2016) Hand gesture recognition using micro-doppler signatures with convolutional neural network. IEEE Access 4:7125–7130
Article Google Scholar
LeCun Y et al (1989) Generalization and network design strategies. In: Pfeifer R, Schreter Z, Fogelman F, Steels L (eds) Connectionism in perspective. Elsevier, Zurich, Switzerland, pp 143–155
Google Scholar
Lee J, Li YA, Hung MH, Huang SJ (2010) A fully-integrated 77-GHz FMCW radar transceiver in 65-nm CMOS technology. IEEE J Solid-State Circuits 45(12):2746–2756
Article Google Scholar
Liu L, Popescu M, Skubic M, Rantz M, Yardibi T, Cuddihy P (2011) Automatic fall detection based on Doppler radar motion signature. In: 5th international conference on pervasive computing technologies for healthcare and workshops. pp 222–225
Long N, Wang K, Cheng R, Yang K, Bai J (2018) Fusion of millimeter wave radar and RGB-depth sensors for assisted navigation of the visually impaired. In: Proc. SPIE 10800, Millimetre wave and terahertz sensors and technology XI, SPIE Security + Defence, Berlin, Germany, vol 10800, pp 1080006. https://doi.org/10.1117/12.2324626
McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 1325–1334
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Fürnkranz J, Joachims T (eds) 27th international conference on machine learning (ICML). Omnipress, Madison, pp 807–814
Google Scholar
Ng JYH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 4694–4702
Pigou L, van den Oord A, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int J Comput Vis 126(2):430–439
Article MathSciNet Google Scholar
Polfliet V, Knudde N, Vandersmissen B, Couckuyt I, Dhaene T (2018) Structured inference networks using high-dimensional sensors for surveillance purposes. In: Presented at the EANN2018, the 19th international conference on engineering applications of neural Networks, vol 893. Springer, Cham, pp 1–12
Google Scholar
Ritchie M, Fioranelli F, Borrion H, Griffiths H (2017) Multistatic micro-Doppler radar feature extraction for classification of unloaded/loaded micro-drones. IET Radar Sonar Navig 11(1):116–124
Article Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: IEEE international conference on computer vision (ICCV)
Vandersmissen B, Knudde N, Jalalvand A, Couckuyt I, Bourdoux A, De Neve W, Dhaene T (2018) Indoor person identification using a low-power FMCW radar. IEEE Trans Geosci Remote Sens 56(7):3941–3952
Article Google Scholar
Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517
Article Google Scholar
Wang S, Song J, Lien J, Poupyrev I, Hilliges O (2016) Interacting with soli: exploring fine-grained dynamic gesture recognition in the radio-frequency spectrum. In: 29th annual symposium on user interface software and technology (UIST). ACM, New York, pp 851–860
Wu M, Dai X, Zhang YD, Davidson B, Amin MG, Zhang J (2013) Fall detection based on sequential modeling of radar signal time-frequency features. In: IEEE international conference on healthcare informatics (ICHI). IEEE Computer Society, Washington, pp 169–174
Wu X, Ren J, Wu Y, Shao J (2018) Study on target tracking based on vision and radar sensor fusion. Tech. rep., SAE Technical Paper
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp. 818–833
Zhang HB, Zhang YX, Zhong B, Lei Q, Yang L, Du JX, Chen DS (2019) A comprehensive survey of vision-based human action recognition methods. IEEE Sens 19(5):1005
Article Google Scholar
Zhao M, Li T, Abu Alsheikh M, Tian Y, Zhao H, Torralba A, Katabi D (2018) Through-wall human pose estimation using radio signals. In: IEEE conference on computer vision and pattern recognition (CVPR). pp 7356–7365

Download references

Acknowledgements

The research activities described in this paper were funded by Ghent University-imec, the Fund for Scientific Research-Flanders (FWO-Flanders), and the European Union.

Author information

Authors and Affiliations

Department of Electronics and Information Systems, Ghent University–IMEC, Technologiepark-Zwijnaarde 122, 9052, Gent, Belgium
Baptist Vandersmissen, Azarakhsh Jalalvand & Wesley De Neve
Department of Information Technology, Ghent University–IMEC, Technologiepark-Zwijnaarde 126, 9052, Gent, Belgium
Nicolas Knudde, Ivo Couckuyt & Tom Dhaene
Center for Biotech Data Science, Ghent University Global Campus, 119 Songdomunhwa-ro, Yeonsu-gu, Incheon, Korea
Wesley De Neve

Authors

Baptist Vandersmissen
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Knudde
View author publications
You can also search for this author in PubMed Google Scholar
Azarakhsh Jalalvand
View author publications
You can also search for this author in PubMed Google Scholar
Ivo Couckuyt
View author publications
You can also search for this author in PubMed Google Scholar
Tom Dhaene
View author publications
You can also search for this author in PubMed Google Scholar
Wesley De Neve
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baptist Vandersmissen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Indoor human activity recognition on combined data set

In this study, we developed a deep learning approach toward automatic indoor human activity recognition. Moreover, this approach is validated on two separate data sets that are both applicable in a different domain. For the sake of completeness, we explore the efficacy of an integrated system that is capable of predicting the correct activity when dealing with a combined data set of gestures and events. To that end, both data sets are merged and the 3d-CNN and ResCNN networks are employed for the radar and camera sensors, respectively. The combined data set consists of 3852 samples distributed over 12 different activities. Table 4 lists the total number of samples per activity. Similar to the experiments performed in Sects. 7.2 and 7.3, the sample length is set to 2 s or 30 frames.

Table 8 shows the obtained results of both the radar- and video-based model. The results suggest that our developed approach is valid for the combined data set. The radar-based 3d-CNN achieves 14.40 % and 6.67 % error rate on the cross-validation and random split evaluation approach, respectively. These results are similar to those obtained on the gestures data set (c.f., Sect. 7.2). Similarly, the video-based ResCNN network obtains 3.52 % and 2.70 % error rates for \({\overline{S}}\) and RS, respectively.

Furthermore, an experiment is conducted that shows the benefit of fusing both sensors. More precisely, artificially darkened frames (denoted by the \(*\) operator) are used as input for the video-based model. This input has a clear negative effect on the error rate of the ResCNN network since it degrades by nearly 20 % and 13 % for \({\overline{S}}\) and RS, respectively. However, through the combined use of both sensor-specific networks this effect is not pronounced in the late fusion approach (Fused*). The performance of this approach only degrades by 2 % in comparison with the use of clean RGB data. Moreover, the fused approach that uses artificially darkened video data still outperforms the radar-only approach by a margin of 2 %.

Table 8 Results for leave-one-subject \(S_i\)-out cross-validation (\({\overline{S}}\)), with \(i \in \{1\dots 9\}\), and stratified random split (RS) for the combined data set

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vandersmissen, B., Knudde, N., Jalalvand, A. et al. Indoor human activity recognition using high-dimensional sensors and deep neural networks. Neural Comput & Applic 32, 12295–12309 (2020). https://doi.org/10.1007/s00521-019-04408-1

Download citation

Received: 08 January 2019
Accepted: 30 July 2019
Published: 09 August 2019
Issue Date: August 2020
DOI: https://doi.org/10.1007/s00521-019-04408-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indoor human activity recognition using high-dimensional sensors and deep neural networks

Abstract

Access this article

Similar content being viewed by others

Patient activity recognition using radar sensors and machine learning

Deep Learning Algorithms for Human Activity Recognition: A Comparative Analysis

Elderly Care - Human Activity Recognition Using Radar with an Open Dataset and Hybrid Maps

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Indoor human activity recognition on combined data set

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Indoor human activity recognition using high-dimensional sensors and deep neural networks

Abstract

Access this article

Similar content being viewed by others

Patient activity recognition using radar sensors and machine learning

Deep Learning Algorithms for Human Activity Recognition: A Comparative Analysis

Elderly Care - Human Activity Recognition Using Radar with an Open Dataset and Hybrid Maps

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Indoor human activity recognition on combined data set

Appendix: Indoor human activity recognition on combined data set

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation