A convolution neural network approach for fall detection based on adaptive channel selection of UWB radar signals

According to the World Health Organization and other authorities, falls are one of the main causes of accidental injuries among the elderly population. Therefore, it is essential to detect and predict the fall activities of older persons in indoor environments such as homes, nursing, senior residential centers, and care facilities. Due to non-contact and signal confidentiality characteristics, radar equipment is widely used in indoor care, detection, and rescue. This paper proposes an adaptive channel selection algorithm to separate the activity signals from the background using an ultra-wideband radar and to generalize fused features of frequency- and time-domain images which will be sent to a lightweight convolutional neural network to detect and recognize fall activities. The experimental results show that the method is able to distinguish three types of fall activities (i.e., stand to fall, bow to fall, and squat to fall) and obtain a high recognition accuracy up to 95.7%.


Introduction
In today's world, the population is growing and aging, resulting in an unbalanced population structure. The United Nations and Social Affairs reported that by 2050 the world's population will reach 10 billion from the current 7.7 billion [1]. The population over the age of 65 accounts for about 9.1% (i.e., 701 million) and will reach 16% in 2050. At that time, elderly care will be one of the most prominent issues in the world.
Moreover, according to the World Health Organization and other authorities, falls account for 50.96% of the causes of accidental injuries [2] and even deaths among older people. Therefore, timely fall detection and treatment are essential to protect the health of the elderly.
Researchers have adopted different solution to detect and identify falls in elderly care [3][4][5]. Wearable sensors such as accelerometers, gyroscopes [6,7], electrocardiogram (ECG) are firstly used in monitoring senior's physical and physiological status for fall detection and recognition. However, wearable devices are easy to forgotten or lose [8]. Therefore, non-contact methods have been introduced to capture fall actions. One of the techniques is to use computer vision-based methods [9,10], but the drawback is that it faces privacy issues.
Other methods are also used, such as ambient sensors [11], which mainly include pressure sensors, infrared sensors, and ultrasonic sensors. Ambient sensors are characterized by diverse data, however, based on the size and complexity of the environment, several devices may need to be deployed [12].
In this work, we adopted a ultra-wideband (UWB) radar to obtain raw data and used an adaptive channel select method [13] to separate the background from the useful signal. Then, a fused feature set of frequency-and timedomain images are used to train the model for fall recognition.
Our approach can capture human activities even through an obstacle such as a wall and track human movements. The main contributions of the work are the following: -continuous monitoring and recognition of the most common acute events (e.g., fall) in the home life of the elderly; -an adaptive channel selection algorithm for distinguishing background with fall activity; -a feature fusion method based on frequency-and timedomain images for increasing recognition accuracy.
The rest of the article is organized as follows: Sect. 2 discusses the related work on sensing methods of fall detection. Section 3 describes the proposed radar-based system and introduces the experiment setup. Section 4 describes in detail the algorithm in our paper. Section 5 explains the experimental results and discussion. Finally, Sect. 6 concludes the paper and outlines planned future work.

Related work
As introduced in the previous section, there are several solutions to automatically detect human falls. We can identify three main approaches: 1. computer-vision based method (based on cameras); 2. wearable technologies such as smartwatches, smartphones, and smart belts; 3. non-contact sensors such as passive infrared sensors, magnetic contact sensors, pressure mats, or radiofrequency sensors.

Computer vision-based fall detection
Computer vision-based fall detection methods generally use a camera, which is usually fixed at a specific position and obtains continuous data frames for detecting and recognizing activities. In [14], the authors proposed a 3-Dimensional convolutional neural network (CNN) based method for fall detection, which only uses video kinematic data to train an automatic feature extractor and could circumvent the requirement of a large sample size of data.
In [15], the authors proposed a method to detect falls by analyzing human shape deformation during a video sequence. The experiments have been conducted on an actual data set of daily activities and simulated falls and give promising results compared with other standard image processing methods. An interesting study has been conducted by [16], where a lightweight neural network, namely You Only Look Once third edition (YOLOv3), was proposed to improve accuracy and responsiveness of fall detection. However, privacy concerns cannot be ignored for the camera-based approach. Therefore, Kinect depth images are being used to capture shadow-like images of the patient and their room to resolve concerns about patients' privacy in [17]. As a result of previous research, a fall detection system has been developed and installed in hospital rooms, and alarms are generated upon the detection of fall events. Then nurses would go through the stored depth videos to investigate for possible injury as well as the causes that may have caused the patient's fall to prevent future occurrences.
The data from computer vision-based sensors is intuitive and easy to analyze but lacks the ability to protect privacy. Even though depth cameras exhibit low-privacy concerns, the cost of such devices is typically high.

Wearable sensor-based fall detection
Wearable-based fall detection is currently the most popular approach due to the thriving development of sensor technologies and pervasive computing. Indeed, this approach is mainly focused on using the motion data from sensors, such as accelerometers, and gyroscopes. These sensors are directly integrated into devices with microcontrollers, such as smartphones or smartwatches, and can be worn by the residents [18,19]. Ballı et al. used a smartwatch machine learning approach to recognize the fall action [20]. Zhao et al. proposed a method based on a tri-axial gyroscope for fall events recognition. A tri-axial gyroscope is placed at the user's waist to collect tri-axial angular velocity information [21]. Also, De Araújo et al. [22] present a smartwatch-based accelerometer to detect falls. Although wearable sensors are usually small, lightweight and easy to deploy, wireless communication is sometimes unstable; meanwhile, the device is easily forgotten or lost and requires frequent charging.

Ambient sensor-based fall detection
Ambient devices that monitor falls mainly include pressure sensors, infrared sensors, radar systems, and radio-frequency equipment. In [23], a novel double pressure sensors-based system was proposed. The result with random forest yielded the best fall detection model with 100% accuracy. Ogawa et al. [24] proposed a fall detection method using infrared radiation sensor array. Miawarni et al. [25] proposed 2-dimensional lidar as the main sensor in a fall detection system, with high recognition accuracy.
The application of environmental sensors in fall detection usually uses infrared sensors or pressure sensors, etc. These sensors require higher requirements of the environment, for example, the infrared sensor requires an environment without any obstructing objects, and the pressure sensors need to be deployed on a large area in some scenes.

UWB radar-based fall detection
Compared with previous methods, the radio-frequency method is a better choice for home care monitoring and to detect accidents that occur (e.g., fall), thanks to the noncontact data collection and intrinsic privacy preservation. Li et al. [26] used UWB radar and three inertial sensors on the wrist, waist, and ankle, and rely on a bidirectional Long Short-Term Memory (bi-LSTM) network with multi-information fusion, and has achieved high accuracy in detecting falls. However, using multiple heterogeneous sensors leads to complex operations and increases the complexity of the algorithm. Julien et al. [27] extracted fall features based on weight joint distance time-frequency transformation and uses bagged decision tree and k-Nearest Neighbor (kNN) to obtain an accuracy of 91.5% and 88.6%, respectively. Sadreazami et al. [28] presented a radar-based fall detection method using compressed features of the radar signals. The compressed features are obtained by using deterministic row and column sensing. The time-frequency analysis is first performed on the radar time series and resulting spectrogram is projected onto a binary image representation. The binary images are then compressed using a 2D deterministic sensing technique by preserving the aspect ratio of the images in the compressed domain. The performance of the proposed method is evaluated using several classifiers. It is shown that the proposed compressive sensing based method can improve fall versus non-fall activities recognition. Khawaja et al. [29] used multiple UWB radar transceivers introducing a fall detection, localization and tracking technique for people in need of assistance. The proposed method allows for precise monitoring of people with special needs without any tags or wearables. To enhance ranging precision, the authors introduced a novel method of fall detection based on the residual co-variance from extended Kalman filter. Computer simulations demonstrated the effectiveness of the proposed technique for fall detection applications.
As shown in Table 1, we can clearly understand the advantages and disadvantages of various solutions. In this paper, we select UWB radar for its properties of noncontact monitoring, easy deployment, high-resolution, and low-privacy concerns for detecting falls of the elderly. Meanwhile, to improve the accuracy and stability of fall monitoring, we analyze the signal to accurately identify the localization information of the fall activity and extract its detailed action and using a fusion method with a deep convolutional neural network to archive a high accuracy recognition.

System prototype
As shown in Fig. 1a, the UWB radar equipment chip used for data acquisition is the NVA-R631 in the NVA-R6X1 Novella series development board produced by Novelda; two patch antennas for transmission and reception are placed parallel. Universal Serial Bus-Serial Peripheral Interface (USB-SPI) bus conversion interface supports the data transmission with a transmission rate of 480Mb/s. In a radar system, the sampling data consists of twodimensional time: slow time sampling, which refers to the actual time; fast time sampling, which refers to the distance of an object from the radar. Figure 1b shows the data transmitted via a transmitting antenna to a receiving antenna, then passed by ADC (analog-to-digital converter) to generate the raw data in which the horizontal axis is fast time, and the vertical axis is slow time.

System model for detecting human motion
UWB radar transmits a first-order Gaussian pulse signal p(t), which can be roughly expressed by Formula 1, where A(t) represents the pulse waveform, and T p represents the pulse interval.
The transmitted signal p tr ðtÞ of the system can be expressed by Formula 2, where M represents the number of transmitted pulses, and T pr represents the pulse period.
After high-speed sampling, the radar system receives the signal frames, see Formula 3, where m represents the m-th frame of the received radar signal r m , with n 2 N represents the number of the channel. All continuous frames can be formed as the radar signal R (see Formula 4), where T s and T f represent fast time and slow time sampling interval, respectively.
The received signals sðt; sÞ can be expressed by Formula 5, where a j is the shock signal amplitude of a stationary object in the surrounding environment; a v is the signal amplitude of the human body; c is the electromagnetic wave speed; s and t are fast time and slow time at a certain moment, respectively; r 0 , DrðtÞ represent the average distance with its change of the human body from the radar, respectively; f j and D j represent the frequency and amplitude of each channel, respectively; s noise represents all the noise signals of the radar.

Experiment setup
The radar system is assembled in an indoor environment and deployed on a desk with a height of 1m to the floor. The data collection area is 1.2 m-1.5 m away from the radar in the longitudinal direction as shown in Fig. 2a and b.
The collected data samples came from nine male volunteers aged 24-40. The average height and weight of the volunteers are 172:5 AE 4:6 m and 69:6 AE 7:9 kg. The volunteers simulated three fall actions, which frequently occur in senior population. We collected a total of 400 sets containing three types of fall data. The radar data relies on Matlab scripts for data collection. Our experiment sets the fast time channel and the slow time sampling frequency of the radar are 512 and 10 Hz, respectively.
The protocol of the experiments is summarized in the following: -Stand to Fall-The subject was asked to stand in front of the mat and, after holding for two seconds, to fall down on the mat; -Bow to Fall-The subject was asked to bend over in front of the mat and, after holding for two seconds, to fall down on the mat; -Squat to Fall-The subject was asked to squat down in front of the mat and, after holding for two seconds, to fall down on the mat.
Each participant was asked to repeat all the above simulated falls 15 times, and each activity lasted 4 seconds on average.

Data preprocessing
The collected data contains various noise sources including low-frequency noise reflected by the surrounding environment and high-frequency noise inside and outside the radar, which will seriously affect the detection of falls. Therefore, it is necessary to remove low-frequency and high-frequency noise for obtaining good detection. Wang et al. [30] used a single stage canceller to filter out lowfrequency noise, but high-frequency noise and clutter will still exist and would affect the detection accuracy. In [31], the stationary and non-stationary clutters were removed by employing the singular value decomposition (SVD) algorithm when the signal-to-noise ratio (SNR) is low. Considering the computational time cost and complexity, we decided to send raw data in parallel to both Fast Diversified data types Low privacy concern Higher requirements of the space environment UWB radar [27][28][29] Wide bandwith Low privacy concern Fine signal processing methods required Fourier transform (FFT) and SVD filters. The FFT filter will filter out the direct-current (DC) component and some parts of the low frequency of the signal to obtain the FFT image features (i.e., frequency-domain feature); and the SVD filter will remove high-frequency clutters and lowfrequency background noise to generalize SVD image features (i.e., time-domain feature).

FFT filter and frequency-domain feature image extraction
FFT decomposes a function of time (a signal) into the frequencies.
As we can observe in the Fig. 3, the raw data from the radar sensor is a discrete signal. The FFT, therefore, is used to preprocess the signal (see Formula 6) to filter the DC component and a part of low frequency noises of the signal.
After filtering out the noise, the activity performed by the subject can be observed in blue dash boxes (see Fig. 3).
The frequency-domain feature image is the center-shifted FFT image.

SVD filter and time-domain feature image extraction
The SVD algorithm is widely used for dimension reduction and noise filtering in signal processing. In our method, the image can be seen as a matrix A (i.e., an m Â n-order), all elements of which belong to the domain K, i.e., the real numbers space or the complex numbers space. According to the SVD, A can be decomposed as in Formula 7, where U is a unitary matrix of order m Â m; R is a positive semidefinite m Â n order diagonal matrix; and V T , the conjugate transpose of V, is a unitary matrix of order n Â n. U and V are both orthogonal matrices, such that UU T ¼ I and VV T ¼ I.
The left singular vector is the eigenvector of AA T , and the right singular vector is the eigenvector of A T A, as shown in Formula 8,9, where k i and f i are eigenvalues corresponding to the eigenvectors u i and v i , respectively. Obviously, we can get U, V from u i ; v i .
Finally, R can be calculated by the Formula 10, where r i is the singular value and used to compose the R.
After the above calculation, we can obtain A ¼ P r i¼1 r i u i v T i . Generally, the larger r is, the more significant its contribution to matrix A. Therefore, we filter according to this principle to remove some values with smaller r. After SVD filtering and reconstruction, the timedomain feature image is obtained as shown in Fig. 4, and the activity performed by the subject is also clearly observed in blue dash boxes.

Adaptive channels selection algorithm
After filtering, we can observe the activity that occurs in specific channels (see Fig. 5), and the energy of these channels is higher than that of the channels related to the background; the channel energy is calculated by Formula 11.
Therefore, in this work, we propose an adaptive channel selection Algorithm 1 to distinguish the background from the activity performed by the subject.
In the algorithm, channels are automatically selected by an energy threshold E(th). Here, as shown in Table 2, we have listed the parameters, and in the following Sect. 5.1, we will discuss the performance evaluation.

Data normalization
The data of fall action collected by the radar is preprocessed to generate a standardized data set. The standardized data set is divided into a training set and a test set.

Deep convolutional neural network
In recent years, convolutional neural networks (CNN) have shined in the field of image processing and image recognition with their particularly outstanding performances. The internal convolution calculation of the CNN can automatically extract the features of the data so that the feature selection no longer requires a lot of time and effort and dramatically improves the recognition accuracy.
The main proposal of this work is to fuse the frequencyand time-domain images as inputs for fall detection and recognition from the radar signal. Table 3 depicts the network architecture setup.
Since our sample size is small, we design a relatively low-depth network and reduce the number of parameters in each layer of the network. To prevent over-fitting, our proposed method introduces two measures: -Add L2 regularization Term-In deep learning, small samples for deep networks easily cause over-fitting; adding regularization to the network is a way to solve over-fitting. Therefore, we add the L2 regularization term to the network to prevent over-fitting of the model results. -Introducing dropout layer-In the deep learning network, the more complex the model, the more parameters it needs to learn. Therefore, a dropout layer is introduced to randomly reduce parameters by 20% to eliminate less important information. This allows the model to obtain good results in the training set, and makes the model easier to generalize and improves robustness.
In the deep neural network training process, the network usually sends each batch to train to have a different distribution. In addition, the data distribution will also change during the training process. It brings difficulties to the next layer of network learning. Therefore, batch normalization is to force the data back to the normal distribution with a mean of 0 and a variance of 1, making the data distribution consistent and avoiding gradient disappearance.

Experiment result and discussion
In our process, the proposed model automatically learns a large number of parameters by extracting characteristics of the fall signal. Finally, these parameters will be tested with the test set to verify the learning effect of the model. For ensuring the independence of data distribution, the test set and the train set are divided randomly at a ratio of 1:1, and both parts are independent of each other. Five experiments are repeated, and each experiment needs to redivide the train set and the test set.

Threshold parameter selection of the adaptive algorithm
The Adaptive Algorithm requests a threshold to divide the background and the activity. As shown in Table 2, we considered different parameters to evaluate its performance in recognizing various fall activities. The values fall into the range of 0:0648 ! 1:827, which belongs to one of the input images. We selected the mean as a measure of central tendency, and calculate 1 4 k; k 2 ð1; 2; ::; 7Þ. Figure 6 shows that the value of the threshold is less than mean, the recognition accuracy, recall, and precision are slowly increasing. When the threshold value reaches the mean value, the recognition accuracy, recall, and precision reach the peak. Above the mean value, the recognition accuracy decreased quickly; therefore, we can avoid selecting parameters values greater than 0.411. On the left of Fig. 6, the data values are very close, i.e., the accuracy values of 1 2 mean, 3 4 mean, Std, and mean are 94.7, 95.2, 94.3, and 94.92, respectively. And, from Table 4, it can be observed that choosing 3 4 mean as the filtering channel threshold produces the best effect. However, the actual difference between the results obtained by 3 4 mean with the adjacent parameters is very small. Later, as the threshold becomes higher and higher, the classification effect will drop sharply; in other words, using the mean value as a threshold for screening channels is already effective enough to eliminate interference and filter out worthless channels. To facilitate calculations, therefore, we here selected using mean as the threshold.

Performance evaluation of the training and loss
The loss function reflects degree of convergence of the model in the deep learning network. When the loss function converges to a small value and no more extended changes, it indicates that the model has converged. In this paper, cross-entropy is used as the loss function. The model accuracy reaches 100%, and the loss function drops to about 0.1513 on the train set. Since the loss function fluctuates slightly near a smaller value, the model has converged. The training process and loss function changes are shown in Fig. 7. Therefore, we consider the loss function changes to set up our network parameters to the right epoch. In this paper, we set up our epoch as 10. Tables 5 and 6 show the frequency-domain and the timedomain feature used to recognize the fall activities. As we can observe, the maximum accuracy is obtained by using a single feature of FFT images in five times of testing is 91:3%, and the average accuracy is 90:64%. It is much similar as the case of a single feature of SVD images; the maximum accuracy is 91:4% with an average accuracy of 90:46%. The FFT or SVD image features provide almost the same effect for classifying the fall activities in our experiments. While we adopted FFT and SVD images as a fused feature, as shown in Table 7, the maximum accuracy raised to 95:7% with an average accuracy of 94:92%. As a consequence, we can conclude that each feature may contain information that the other does not possess; therefore, combining two different features is a better choice than using a single feature.

Comparison with other algorithms
It can be seen from Table 8 that the performance of other algorithms is not as good as the proposed method using the same inputs. With traditional machine learning techniques (i.e. kNN, SVM, Naive Bayes, AdaBoost, and Random Forest), the maximum accuracy we could achieve was 92:6% with the SVM classifier.
The above results show that the proposed adaptive channel selection algorithm with the deep neural network is more suitable for fall detection using the UWB radar sensor.

Discussion
The purpose of our study is to detect and identify the fall event, so as to provide medical services as soon as possible. However, falling is a dangerous situation for people of any  age. Therefore, the data we used in the research phase are all from the simulation of falls performed by young people in the laboratory. The experiment took place in a big office with a relatively complex environment, differently from the usual laboratory with absorbing walls. This is because the actual application of this monitoring method will be used in environments with objects such as furniture, plants, etc. For simplifying the experiment, the radar was just set on a desk with a distance of 1.5m to the subject.
During the experiment, we found that the characteristics of the radar feedback images obtained are also very different due to the different types of actions. Therefore, we  only detect and recognize three common falls in this article. The three falling behaviors we proposed correspond to three situations, namely: 1. Tripping by obstacles; 2. Bending over to pick up things; 3. Squat down and lacing up a shoe.
As we can observe from Fig 3, although their radar feedback chart looks different, it is hard to distinguish them by naked eyes. At the same time, we also found that the power changes in the channels are closely related to human activities. Therefore, we used an adaptive channel selection method to obtain the useful channels. Experiments have proved that the recognition accuracy of inputting complete channels related to the activity is higher than that obtained by a single channel input. Since the human body is a whole, the distance change between each part of the human body to the radar is related to different actions. Simultaneously, channels correspond to the distance of radar to subject. Therefore, the relationship between channels (corresponding in the area of subject exist) reflects the relationship of motion status between different parts of the human body. This is the reason for using an adaptive selector to find out the edge channel of the activity.

Conclusion
This paper proposed an adaptive channel selection algorithm to reduce the data dimensions and used fusion features images of FFT and SVD via deep neural networks. By calculating the radar signals' energy change, a threshold is utilized to adaptively select the area of signals that is most likely to be the fall activity. Through the miniaturization of the network and the optimal configuration of parameters, the network can be adapted to small sample data to detect and identify three types of falls, i.e., stand to fall, bow to fall, and squat to fall. Results showed that using selected channels with fused features compared to single input frequency-and time-domain data will significantly increase the recognition accuracy from 90.64% and 90.46% to 95.7%, respectively. In future work, we plan to conduct more experiments on expanding sample size; we will also define an algorithm that can detect and recognize activities in multi-residential environments.