1 Introduction

In today’s world, the population is growing and aging, resulting in an unbalanced population structure. The United Nations and Social Affairs reported that by 2050 the world’s population will reach 10 billion from the current 7.7 billion [1]. The population over the age of 65 accounts for about 9.1% (i.e., 701 million) and will reach 16% in 2050. At that time, elderly care will be one of the most prominent issues in the world.

Moreover, according to the World Health Organization and other authorities, falls account for 50.96% of the causes of accidental injuries [2] and even deaths among older people. Therefore, timely fall detection and treatment are essential to protect the health of the elderly.

Researchers have adopted different solution to detect and identify falls in elderly care [3,4,5]. Wearable sensors such as accelerometers, gyroscopes [6, 7], electrocardiogram (ECG) are firstly used in monitoring senior’s physical and physiological status for fall detection and recognition. However, wearable devices are easy to forgotten or lose [8]. Therefore, non-contact methods have been introduced to capture fall actions. One of the techniques is to use computer vision-based methods [9, 10], but the drawback is that it faces privacy issues.

Other methods are also used, such as ambient sensors [11], which mainly include pressure sensors, infrared sensors, and ultrasonic sensors. Ambient sensors are characterized by diverse data, however, based on the size and complexity of the environment, several devices may need to be deployed [12].

In this work, we adopted a ultra-wideband (UWB) radar to obtain raw data and used an adaptive channel select method [13] to separate the background from the useful signal. Then, a fused feature set of frequency- and time-domain images are used to train the model for fall recognition.

Our approach can capture human activities even through an obstacle such as a wall and track human movements. The main contributions of the work are the following:

  • continuous monitoring and recognition of the most common acute events (e.g., fall) in the home life of the elderly;

  • an adaptive channel selection algorithm for distinguishing background with fall activity;

  • a feature fusion method based on frequency- and time-domain images for increasing recognition accuracy.

The rest of the article is organized as follows: Sect. 2 discusses the related work on sensing methods of fall detection. Section 3 describes the proposed radar-based system and introduces the experiment setup. Section 4 describes in detail the algorithm in our paper. Section 5 explains the experimental results and discussion. Finally, Sect. 6 concludes the paper and outlines planned future work.

2 Related work

As introduced in the previous section, there are several solutions to automatically detect human falls. We can identify three main approaches:

  1. 1.

    computer-vision based method (based on cameras);

  2. 2.

    wearable technologies such as smartwatches, smartphones, and smart belts;

  3. 3.

    non-contact sensors such as passive infrared sensors, magnetic contact sensors, pressure mats, or radio-frequency sensors.

2.1 Computer vision-based fall detection

Computer vision-based fall detection methods generally use a camera, which is usually fixed at a specific position and obtains continuous data frames for detecting and recognizing activities. In [14], the authors proposed a 3-Dimensional convolutional neural network (CNN) based method for fall detection, which only uses video kinematic data to train an automatic feature extractor and could circumvent the requirement of a large sample size of data. In [15], the authors proposed a method to detect falls by analyzing human shape deformation during a video sequence. The experiments have been conducted on an actual data set of daily activities and simulated falls and give promising results compared with other standard image processing methods. An interesting study has been conducted by [16], where a lightweight neural network, namely You Only Look Once third edition (YOLOv3), was proposed to improve accuracy and responsiveness of fall detection.

However, privacy concerns cannot be ignored for the camera-based approach. Therefore, Kinect depth images are being used to capture shadow-like images of the patient and their room to resolve concerns about patients’ privacy in [17]. As a result of previous research, a fall detection system has been developed and installed in hospital rooms, and alarms are generated upon the detection of fall events. Then nurses would go through the stored depth videos to investigate for possible injury as well as the causes that may have caused the patient’s fall to prevent future occurrences.

The data from computer vision-based sensors is intuitive and easy to analyze but lacks the ability to protect privacy. Even though depth cameras exhibit low-privacy concerns, the cost of such devices is typically high.

2.2 Wearable sensor-based fall detection

Wearable-based fall detection is currently the most popular approach due to the thriving development of sensor technologies and pervasive computing. Indeed, this approach is mainly focused on using the motion data from sensors, such as accelerometers, and gyroscopes. These sensors are directly integrated into devices with microcontrollers, such as smartphones or smartwatches, and can be worn by the residents [18, 19]. Ballı et al. used a smartwatch machine learning approach to recognize the fall action [20]. Zhao et al. proposed a method based on a tri-axial gyroscope for fall events recognition. A tri-axial gyroscope is placed at the user’s waist to collect tri-axial angular velocity information [21]. Also, De Araújo et al. [22] present a smartwatch-based accelerometer to detect falls. Although wearable sensors are usually small, lightweight and easy to deploy, wireless communication is sometimes unstable; meanwhile, the device is easily forgotten or lost and requires frequent charging.

2.3 Ambient sensor-based fall detection

Ambient devices that monitor falls mainly include pressure sensors, infrared sensors, radar systems, and radio-frequency equipment. In [23], a novel double pressure sensors-based system was proposed. The result with random forest yielded the best fall detection model with 100% accuracy. Ogawa et al. [24] proposed a fall detection method using infrared radiation sensor array. Miawarni et al. [25] proposed 2-dimensional lidar as the main sensor in a fall detection system, with high recognition accuracy.

The application of environmental sensors in fall detection usually uses infrared sensors or pressure sensors, etc. These sensors require higher requirements of the environment, for example, the infrared sensor requires an environment without any obstructing objects, and the pressure sensors need to be deployed on a large area in some scenes.

2.3.1 UWB radar-based fall detection

Compared with previous methods, the radio-frequency method is a better choice for home care monitoring and to detect accidents that occur (e.g., fall), thanks to the non-contact data collection and intrinsic privacy preservation. Li et al. [26] used UWB radar and three inertial sensors on the wrist, waist, and ankle, and rely on a bidirectional Long Short-Term Memory (bi-LSTM) network with multi-information fusion, and has achieved high accuracy in detecting falls. However, using multiple heterogeneous sensors leads to complex operations and increases the complexity of the algorithm. Julien et al. [27] extracted fall features based on weight joint distance time-frequency transformation and uses bagged decision tree and k-Nearest Neighbor (kNN) to obtain an accuracy of 91.5% and 88.6%, respectively. Sadreazami et al. [28] presented a radar-based fall detection method using compressed features of the radar signals. The compressed features are obtained by using deterministic row and column sensing. The time-frequency analysis is first performed on the radar time series and resulting spectrogram is projected onto a binary image representation. The binary images are then compressed using a 2D deterministic sensing technique by preserving the aspect ratio of the images in the compressed domain. The performance of the proposed method is evaluated using several classifiers. It is shown that the proposed compressive sensing based method can improve fall versus non-fall activities recognition. Khawaja et al. [29] used multiple UWB radar transceivers introducing a fall detection, localization and tracking technique for people in need of assistance. The proposed method allows for precise monitoring of people with special needs without any tags or wearables. To enhance ranging precision, the authors introduced a novel method of fall detection based on the residual co-variance from extended Kalman filter. Computer simulations demonstrated the effectiveness of the proposed technique for fall detection applications.

As shown in Table 1, we can clearly understand the advantages and disadvantages of various solutions. In this paper, we select UWB radar for its properties of non-contact monitoring, easy deployment, high-resolution, and low-privacy concerns for detecting falls of the elderly. Meanwhile, to improve the accuracy and stability of fall monitoring, we analyze the signal to accurately identify the localization information of the fall activity and extract its detailed action and using a fusion method with a deep convolutional neural network to archive a high accuracy recognition.

Table 1 Comparison of the Advantages and Disadvantages of common fall detection approaches

3 Proposed system and experimental setup

3.1 System prototype

As shown in Fig. 1a, the UWB radar equipment chip used for data acquisition is the NVA-R631 in the NVA-R6X1 Novella series development board produced by Novelda; two patch antennas for transmission and reception are placed parallel. Universal Serial Bus-Serial Peripheral Interface (USB-SPI) bus conversion interface supports the data transmission with a transmission rate of 480Mb/s.

In a radar system, the sampling data consists of two-dimensional time: slow time sampling, which refers to the actual time; fast time sampling, which refers to the distance of an object from the radar. Figure 1b shows the data transmitted via a transmitting antenna to a receiving antenna, then passed by ADC (analog-to-digital converter) to generate the raw data in which the horizontal axis is fast time, and the vertical axis is slow time.

Fig. 1
figure 1

Diagram of the radar system

3.2 System model for detecting human motion

UWB radar transmits a first-order Gaussian pulse signal p(t), which can be roughly expressed by Formula 1, where A(t) represents the pulse waveform, and \(T_p\) represents the pulse interval.

$$\begin{aligned} \textit{ p(t)}=\left\{ \begin{aligned} \textit{A(t)},&\quad {0 \le t < T_p} \\ 0,&\quad others \\ \end{aligned} \right. \end{aligned}$$
(1)

The transmitted signal \(p_{tr}(t)\) of the system can be expressed by Formula 2, where M represents the number of transmitted pulses, and \(T_{pr}\) represents the pulse period.

$$\begin{aligned} \textit{p}_{tr}(t)=\displaystyle {\sum _{M=1}^{M}{} \textit{p}(t-(M-1)T_{pr})} \end{aligned}$$
(2)

After high-speed sampling, the radar system receives the signal frames, see Formula 3, where m represents the m-th frame of the received radar signal \(r_m\), with \(n \in N\) represents the number of the channel. All continuous frames can be formed as the radar signal \(\mathbf{R}\) (see Formula 4), where \(T_s\) and \(T_f\) represent fast time and slow time sampling interval, respectively.

$$\begin{aligned} \mathbf{r} _m&={ \left[ \begin{array}{ccccc} \mathbf{r} _{m,1}&{}\ldots &{}\mathbf{r} _{m,n}&{}\ldots &{}\mathbf{r} _{m,N}\\ \end{array} \right] }^\mathrm {T} \end{aligned}$$
(3)
$$\begin{aligned} \mathbf{R} [m, n]&=s(t=mT_s, \tau =nT_f) \end{aligned}$$
(4)

The received signals \({s}(t,\tau )\) can be expressed by Formula 5, where \({a}_j\) is the shock signal amplitude of a stationary object in the surrounding environment; \(a_v\) is the signal amplitude of the human body; c is the electromagnetic wave speed; \(\tau\) and t are fast time and slow time at a certain moment, respectively; \(r_0\), \(\varDelta r(t)\) represent the average distance with its change of the human body from the radar, respectively; \({f}_j\)and \(\varDelta _j\) represent the frequency and amplitude of each channel, respectively; \(s_{noise}\) represents all the noise signals of the radar.

$$\begin{aligned} \left\{ \begin{aligned} \textit{s}(t,\tau )= {\sum _{j}{} \textit{a}_jp(\tau -\tau _j)+a_\nu p(\tau -\tau _\nu (t))}+s_{noise}\\ \tau _\nu (t)=\frac{2(r_0+\varDelta r(t))}{c}=\frac{2(r_0+ {\sum _{j}\varDelta _jsin(2\pi f_jt))}}{c}\\ \end{aligned} \right. \end{aligned}$$
(5)

3.3 Experiment setup

The radar system is assembled in an indoor environment and deployed on a desk with a height of 1m to the floor. The data collection area is 1.2 m–1.5 m away from the radar in the longitudinal direction as shown in Fig. 2a and b.

Fig. 2
figure 2

Deployment of the device with the experiment environment

The collected data samples came from nine male volunteers aged 24–40. The average height and weight of the volunteers are \(172.5\pm 4.6\ \text{m}\) and \(69.6\pm 7.9\ \text {kg}\). The volunteers simulated three fall actions, which frequently occur in senior population. We collected a total of 400 sets containing three types of fall data. The radar data relies on Matlab scripts for data collection. Our experiment sets the fast time channel and the slow time sampling frequency of the radar are 512 and 10 Hz, respectively.

The protocol of the experiments is summarized in the following:

  • Stand to Fall—The subject was asked to stand in front of the mat and, after holding for two seconds, to fall down on the mat;

  • Bow to Fall—The subject was asked to bend over in front of the mat and, after holding for two seconds, to fall down on the mat;

  • Squat to Fall—The subject was asked to squat down in front of the mat and, after holding for two seconds, to fall down on the mat.

Each participant was asked to repeat all the above simulated falls 15 times, and each activity lasted 4 seconds on average.

4 Method

4.1 Data preprocessing

The collected data contains various noise sources including low-frequency noise reflected by the surrounding environment and high-frequency noise inside and outside the radar, which will seriously affect the detection of falls. Therefore, it is necessary to remove low-frequency and high-frequency noise for obtaining good detection. Wang et al. [30] used a single stage canceller to filter out low-frequency noise, but high-frequency noise and clutter will still exist and would affect the detection accuracy. In [31], the stationary and non-stationary clutters were removed by employing the singular value decomposition (SVD) algorithm when the signal-to-noise ratio (SNR) is low.

Considering the computational time cost and complexity, we decided to send raw data in parallel to both Fast Fourier transform (FFT) and SVD filters. The FFT filter will filter out the direct-current (DC) component and some parts of the low frequency of the signal to obtain the FFT image features (i.e., frequency-domain feature); and the SVD filter will remove high-frequency clutters and low-frequency background noise to generalize SVD image features (i.e., time-domain feature).

4.1.1 FFT filter and frequency-domain feature image extraction

FFT decomposes a function of time (a signal) into the frequencies. As we can observe in the Fig. 3, the raw data from the radar sensor is a discrete signal. The FFT, therefore, is used to preprocess the signal (see Formula 6) to filter the DC component and a part of low frequency noises of the signal.

$$\begin{aligned} {X(k)}=\displaystyle {\sum _{n=0}^{N-1}{} \textit{x}(n)e^{\frac{-j2n\pi }{N}}_N}\qquad (k=0,1,2,\dots ,N-1) \end{aligned}$$
(6)

After filtering out the noise, the activity performed by the subject can be observed in blue dash boxes (see Fig. 3). The frequency-domain feature image is the center-shifted FFT image.

Fig. 3
figure 3

Raw data processed with FFT filter

4.1.2 SVD filter and time-domain feature image extraction

The SVD algorithm is widely used for dimension reduction and noise filtering in signal processing. In our method, the image can be seen as a matrix \(\mathbf{A}\) (i.e., an \(m\times n\)-order), all elements of which belong to the domain K, i.e., the real numbers space or the complex numbers space. According to the SVD, \(\mathbf{A}\) can be decomposed as in Formula 7, where U is a unitary matrix of order \(m\times m\); \(\varSigma\) is a positive semi-definite \(m\times n\) order diagonal matrix; and \(V^{T}\), the conjugate transpose of V, is a unitary matrix of order \(n\times n\). U and V are both orthogonal matrices, such that \(UU^{T}=I\) and \(VV^{T}=I\).

$$\begin{aligned} \textit{A}_{m\times n}=U_{m\times m}\varSigma _{m\times n}V^{T}_{n\times n} \end{aligned}$$
(7)

The left singular vector is the eigenvector of \(AA^{T}\), and the right singular vector is the eigenvector of \(A^{T}A\), as shown in Formula 8,  9, where \(\lambda _i\) and \(\zeta _i\) are eigenvalues corresponding to the eigenvectors \(u_i\) and \(v_i\), respectively. Obviously, we can get UV from \(u_i,v_i\).

$$\begin{aligned}&(A\textit{A}^{T})u_i=\lambda _i u_i \end{aligned}$$
(8)
$$\begin{aligned}&(\textit{A}^{T}A)v_i=\zeta _i v_i \end{aligned}$$
(9)

Finally, \(\varSigma\) can be calculated by the Formula 10, where \(\sigma _i\) is the singular value and used to compose the \(\varSigma\).

$$\begin{aligned} \begin{aligned} \textit{A}_{m\times n}=U_{m\times m}\varSigma _{m\times n}V^{T}_{n\times n} \\ \Rightarrow AV=U\varSigma V^TV \\ \Rightarrow AV=U\varSigma \\ \Rightarrow Av_i=\sigma _i u_i\\ \Rightarrow \sigma _i= u_i^T Av_i \end{aligned} \end{aligned}$$
(10)

After the above calculation, we can obtain \({A} = \sum _{i=1}^{r} \sigma _i u_i v_i^T\). Generally, the larger \(\sigma\) is, the more significant its contribution to matrix A. Therefore, we filter according to this principle to remove some values with smaller \(\sigma\). After SVD filtering and reconstruction, the time-domain feature image is obtained as shown in Fig. 4, and the activity performed by the subject is also clearly observed in blue dash boxes.

Fig. 4
figure 4

Raw data processed with SVD filter

4.2 Adaptive channels selection algorithm

After filtering, we can observe the activity that occurs in specific channels (see Fig. 5), and the energy of these channels is higher than that of the channels related to the background; the channel energy is calculated by Formula 11.

$$\begin{aligned} \textit{E(k)}=\left| X(w)\right| ^2 \end{aligned}$$
(11)

Therefore, in this work, we propose an adaptive channel selection Algorithm 1 to distinguish the background from the activity performed by the subject.

Fig. 5
figure 5

Energy spectrum with energy threshold selection

figure a

In the algorithm, channels are automatically selected by an energy threshold E(th). Here, as shown in Table 2, we have listed the parameters, and in the following Sect. 5.1, we will discuss the performance evaluation.

Table 2 Parameters for channel selection

4.3 Data normalization

The data of fall action collected by the radar is preprocessed to generate a standardized data set. The standardized data set is divided into a training set and a test set.

4.4 Deep convolutional neural network

In recent years, convolutional neural networks (CNN) have shined in the field of image processing and image recognition with their particularly outstanding performances. The internal convolution calculation of the CNN can automatically extract the features of the data so that the feature selection no longer requires a lot of time and effort and dramatically improves the recognition accuracy.

The main proposal of this work is to fuse the frequency- and time-domain images as inputs for fall detection and recognition from the radar signal. Table 3 depicts the network architecture setup.

Table 3 Parameter setup of our deep CNN

Since our sample size is small, we design a relatively low-depth network and reduce the number of parameters in each layer of the network. To prevent over-fitting, our proposed method introduces two measures:

  • Add L2 regularization Term—In deep learning, small samples for deep networks easily cause over-fitting; adding regularization to the network is a way to solve over-fitting. Therefore, we add the L2 regularization term to the network to prevent over-fitting of the model results.

  • Introducing dropout layer—In the deep learning network, the more complex the model, the more parameters it needs to learn. Therefore, a dropout layer is introduced to randomly reduce parameters by 20% to eliminate less important information. This allows the model to obtain good results in the training set, and makes the model easier to generalize and improves robustness.

In the deep neural network training process, the network usually sends each batch to train to have a different distribution. In addition, the data distribution will also change during the training process. It brings difficulties to the next layer of network learning. Therefore, batch normalization is to force the data back to the normal distribution with a mean of 0 and a variance of 1, making the data distribution consistent and avoiding gradient disappearance.

5 Experiment result and discussion

In our process, the proposed model automatically learns a large number of parameters by extracting characteristics of the fall signal. Finally, these parameters will be tested with the test set to verify the learning effect of the model.

For ensuring the independence of data distribution, the test set and the train set are divided randomly at a ratio of 1:1, and both parts are independent of each other. Five experiments are repeated, and each experiment needs to re-divide the train set and the test set.

5.1 Threshold parameter selection of the adaptive algorithm

The Adaptive Algorithm requests a threshold to divide the background and the activity. As shown in Table 2, we considered different parameters to evaluate its performance in recognizing various fall activities. The values fall into the range of \(0.0648 \rightarrow 1.827\), which belongs to one of the input images.

We selected the mean as a measure of central tendency, and calculate \(\frac{1}{4}k, k \in (1,2,..,7)\). Figure 6 shows that the value of the threshold is less than mean, the recognition accuracy, recall, and precision are slowly increasing. When the threshold value reaches the mean value, the recognition accuracy, recall, and precision reach the peak. Above the mean value, the recognition accuracy decreased quickly; therefore, we can avoid selecting parameters values greater than 0.411. On the left of Fig. 6, the data values are very close, i.e., the accuracy values of \(\frac{1}{2}\ mean\), \(\frac{3}{4}\ mean\), Std, and mean are 94.7, 95.2, 94.3, and 94.92, respectively. And, from Table 4, it can be observed that choosing \(\frac{3}{4}\ mean\) as the filtering channel threshold produces the best effect. However, the actual difference between the results obtained by \(\frac{3}{4}\ mean\) with the adjacent parameters is very small. Later, as the threshold becomes higher and higher, the classification effect will drop sharply; in other words, using the mean value as a threshold for screening channels is already effective enough to eliminate interference and filter out worthless channels. To facilitate calculations, therefore, we here selected using mean as the threshold.

Fig. 6
figure 6

Performance with different thresholds

Table 4 Parameter selection with different values

5.2 Performance evaluation of the training and loss

The loss function reflects degree of convergence of the model in the deep learning network. When the loss function converges to a small value and no more extended changes, it indicates that the model has converged. In this paper, cross-entropy is used as the loss function. The model accuracy reaches \(100\%\), and the loss function drops to about 0.1513 on the train set. Since the loss function fluctuates slightly near a smaller value, the model has converged. The training process and loss function changes are shown in Fig. 7. Therefore, we consider the loss function changes to set up our network parameters to the right epoch. In this paper, we set up our epoch as 10.

Fig. 7
figure 7

Model training process and loss function changes

5.3 Performance evaluation of fusion features

Tables 5 and 6 show the frequency-domain and the time-domain feature used to recognize the fall activities. As we can observe, the maximum accuracy is obtained by using a single feature of FFT images in five times of testing is \(91.3\%\), and the average accuracy is \(90.64\%\). It is much similar as the case of a single feature of SVD images; the maximum accuracy is \(91.4\%\) with an average accuracy of \(90.46\%\). The FFT or SVD image features provide almost the same effect for classifying the fall activities in our experiments.

While we adopted FFT and SVD images as a fused feature, as shown in Table 7, the maximum accuracy raised to \(95.7\%\) with an average accuracy of \(94.92\%\). As a consequence, we can conclude that each feature may contain information that the other does not possess; therefore, combining two different features is a better choice than using a single feature.

Table 5 Performance of the proposed algorithm using FFT image feature
Table 6 Performance of the proposed algorithm using SVD image feature
Table 7 Performance of the proposed algorithm using fused features

5.4 Comparison with other algorithms

It can be seen from Table 8 that the performance of other algorithms is not as good as the proposed method using the same inputs. With traditional machine learning techniques (i.e. kNN, SVM, Naive Bayes, AdaBoost, and Random Forest), the maximum accuracy we could achieve was \(92.6\%\) with the SVM classifier.

Table 8 Comparison with other machine learning algorithms

The above results show that the proposed adaptive channel selection algorithm with the deep neural network is more suitable for fall detection using the UWB radar sensor.

5.5 Discussion

The purpose of our study is to detect and identify the fall event, so as to provide medical services as soon as possible. However, falling is a dangerous situation for people of any age. Therefore, the data we used in the research phase are all from the simulation of falls performed by young people in the laboratory.

The experiment took place in a big office with a relatively complex environment, differently from the usual laboratory with absorbing walls. This is because the actual application of this monitoring method will be used in environments with objects such as furniture, plants, etc. For simplifying the experiment, the radar was just set on a desk with a distance of 1.5m to the subject.

During the experiment, we found that the characteristics of the radar feedback images obtained are also very different due to the different types of actions. Therefore, we only detect and recognize three common falls in this article. The three falling behaviors we proposed correspond to three situations, namely:

  1. 1.

    Tripping by obstacles;

  2. 2.

    Bending over to pick up things;

  3. 3.

    Squat down and lacing up a shoe.

As we can observe from Fig 3, although their radar feedback chart looks different, it is hard to distinguish them by naked eyes. At the same time, we also found that the power changes in the channels are closely related to human activities. Therefore, we used an adaptive channel selection method to obtain the useful channels. Experiments have proved that the recognition accuracy of inputting complete channels related to the activity is higher than that obtained by a single channel input. Since the human body is a whole, the distance change between each part of the human body to the radar is related to different actions. Simultaneously, channels correspond to the distance of radar to subject. Therefore, the relationship between channels (corresponding in the area of subject exist) reflects the relationship of motion status between different parts of the human body. This is the reason for using an adaptive selector to find out the edge channel of the activity.

6 Conclusion

This paper proposed an adaptive channel selection algorithm to reduce the data dimensions and used fusion features images of FFT and SVD via deep neural networks. By calculating the radar signals’ energy change, a threshold is utilized to adaptively select the area of signals that is most likely to be the fall activity. Through the miniaturization of the network and the optimal configuration of parameters, the network can be adapted to small sample data to detect and identify three types of falls, i.e., stand to fall, bow to fall, and squat to fall. Results showed that using selected channels with fused features compared to single input frequency- and time-domain data will significantly increase the recognition accuracy from 90.64% and 90.46% to 95.7%, respectively.

In future work, we plan to conduct more experiments on expanding sample size; we will also define an algorithm that can detect and recognize activities in multi-residential environments.