Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Fall Detection (FD) is a very active research area, with many applications to healthcare, work safety, etc. Even though there are plenty of commercial products, the best rated products only reach a 80% of success [20]. There are basically two types of FD systems: contex-aware systems and wearable devices [14]. FD has been widely studied using context-aware systems, i.e. video systems [28]; nevertheless, the use of wearable devices is crucial because the high percentage of eldel people and their desire to live autonomously in their own house [19].

Wearables-based solutions may combine different sensors, such as a barometer and inertial sensors [22], 3DACC and gyroscope [23], 3DACC and intelligent tiles [7] or a 3DACC and a barometer in a necklace was also reported in [3]. However, 3DACC is by far the most chosen option [4, 12, 13, 27, 29], with a variable number of sensors and locations, even some of them proposed the use of the smartphone sensory system. Different solutions have been proposed to perform the fall event detection, for instance, a feature extraction stage and Support Vector Machines have been applied directly in [27, 29], using some transformations and thresholds with very simple rules for classifying an event as a fall [4, 13, 16]. A comparison of classifiers has been presented in [12], comparing Decision tree, SVM, Nearest neghbor and Discrimenent analysis. Several threshold-based fall detection algorithms were presented in [4, 9, 10]. The two latter employed three threshold algorithms based to compare with the acceleration magnitude. Igual et al. compared several public datasets for fall detection via support machine vector (SVM) and nearest neighbor (NN) and analyzed results them [15]. The common characteristic in all these solutions is that the wearable devices are placed on the waist or in the chest. This research limits itself to use a single sensor -a marketed smartwatch- placed on the wrist in order to promote its usability.

Interestingly, the previous studies do not focus on the specific dynamics of a falling event: although some of the proposals report good performances, they are just machine learning applied to the focused problem. There are studies concerned with the dynamics in a fall event [2, 8], establishing the taxonomy and the time periods for each sequence. Additionally, Abbate et al. proposed the use of these dynamics as the basis of the FD algorithm [1]. A very interesting point of this approach is that the computational constraints are kept moderate, although this solution includes a high number of thresholds to tune. In [17], this solution was analyzed with data gathered from sensors placed on the wrist, using the Abate solution plus a SMOTE balancing stage and a feed-forward Neural Network. In this research, an alternative based on C5.0 rule based systems is proposed.

2 Adapting Fall Detection to a Wrist-Based Solution

Abate et al. [1] proposed the following scheme to detect a candidate event as a fall event (refer to Fig. 1). A time t corresponds to a peak time (point 1) if the magnitude of the acceleration a is higher than \(th_1=3\,\times \,g, g=9.8\) m/s. After a peak time there must be a period of 2500 ms with relatively calm (no other a value higher than \(th_1\)). The impact end (point 2) denotes the end of the fall event; it is the last time for which the a value is higher than \(th_2=1.5\,\times \,g\). Finally, the impact start (point 3) denotes the starting time of the fall event, computed as the time of the first sequence of an \(a <= th_3\) (\(th_3 = 0.8\times g\)) followed by a value of \(a >= th_2\). The impact start must belong to the interval \([impact\ end - 1200\ \text {ms}, peak\ time]\). If no impact end is found, then it is fixed to peak time plus 1000 ms. If no impact start is found, it is fixed to peak time.

Whenever a peak time is found, the following transformations should be computed:

  • Average Absolute Acceleration Magnitude Variation, \(AAMV= \sum _{t=is}^{ie}\frac{|a_{t+1}-a_t|}{N}\), with is being the impact start, ie the impact end, and N the number of samples in the interval.

  • Impact Duration Index, \(IDI = impact\ end - impact\ start\). Alternatively, it could be computed as the number of samples.

  • Maximum Peak Index, \(MPI=max_{t\in [is, ie]}(a_t)\).

  • Minimum Valley Index, \(MVI=min_{t\in [is-500, ie]}(a_t)\).

  • Peak Duration Index, \(PDI = peak\ end - peak\ start\), with peak start defined as the time of the last magnitude sample below \(th_{PDI}=1.8\times g\) occurred before peak time, and peak end is defined as the time of the first magnitude sample below \(th_{PDI}=1.8\times g\) occurred after peak time.

  • Activity Ratio Index, ARI, measuring the activity level in an interval of 700 ms centered at the middle time between impact start and impact end. The activity level is calculated as the ratio between the number of samples not in \([th_{ARIlow}0.85\times g, th_{ARIIhigh}=1.3\times g]\) and the total number of samples in the 700 ms interval.

  • Free Fall Index, FFI, computed as follows. Firstly, search for an acceleration sample below \(th_{FFI}=0.8\times g\) occurring up to 200 ms before peak time; if found, the sample time represents the end of the interval, otherwise the end of the interval is set 200 ms before peak time. Secondly, the start of the interval is simply set to 200 ms before its end. FFI is defined as the average acceleration magnitude evaluated within the interval.

  • Step Count Index, SCI, measured as the number of peaks in the interval \([peak\ time - 2200, peak\ time]\). SCI is the step count evaluated 2200 ms before peak time. The number of valleys are counted, defining a valley as a region with acceleration magnitude below \(th_{SCIlow}=1\times g\) for at least 80 ms, followed by a magnitude higher than \(th_{SCIhigh}1.6\times g\) during the next 200 ms. Some ideas on computing the time between peaks [26] were used when implementing this feature.

Fig. 1.
figure 1

Evolution of the magnitude of the acceleration -y-axis, extracted from [1].

Evaluating this approach was proposed as follows. The time series of acceleration magnitude values are analyzed searching for peaks that marks where a fall event candidate appears. When it happens to occur, the impact end and the impact start are determined, and thus the remaining features. As long as this fall events are detected when walking or running, for instance, a Neural Network (NN) model is obtained to classify the set of features extracted.

In order to train the NN, the authors made use of an Activities of Daily Living (ADL) and FD dataset, where each file contains a Time Series of 3DACC values corresponding to an activity or to a fall event. Therefore, each dataset including a fall event or a similar activity -for instance, running can perform similarly to falling- will generate a set of transformation values. Thus, for a dataset file we will detect something similar to a falling, producing a row of the transformations computed for each of the detected events within the file. If nothing is detected withing the file, no row is produced. With this strategy, the Abbate et al. obtained the training and testing dataset to learn the NN.

2.1 The Modifications on the Algorithm

As stated in [11, 25], the solutions to this type of problems must be ergonomic: the users must feel comfortable using them. We considered that placing a device on the waist is not comfortable, for instance, it is not valid for women using dresses. When working with elder people, this issue is of main relevance. Therefore, in this study, we placed the wearable device on the wrist. This is not a simple change: the vast majority of the literature reports solutions for FD using waist based solutions. Moreover, according to [24] the calculations should be performed on the smartwatches to extend the battery life by reducing the communications. Therefore, these calculations should be kept as simple as possible.

A second modification is focused on the training of the NN. The original strategy for the generation of the training and testing dataset produced a highly imbalanced dataset: up to 81% of the obtained samples belong to the class FD, while the remaining belong to the different ADL similar to a fall event.

To solve this problem a normalization stage is applied to the generated imbalanced dataset, followed by a SMOTE balancing stage [6]. This balancing stage will produce a 60%(FALL)–40%(no FALL) dataset, which would allow to avoid the over-fitting of the NN models. As usual, there is a compromise between the balancing of the dataset and the synthetic data samples introduced in the dataset.

These above mentioned changes have already been studied in [17]. In this research we proposed to analyze the performance of rule based systems in this context, which represents more simpler models that can be easily deployed in wearable devices and with a very reduced computational complexity. Therefore, they could represent a very interesting improvement, either if they work similarly to the NN or just similarly to them.

3 Experiments and Results

A ADL and FD dataset is needed to evaluate the adaptation, so it contains time series sample from ADL and for falls. This research made use of the UMA-FALL dataset [5] among the publicly available datasets. This dataset includes data for several participants carrying on with different activities and performing forward, backward and lateral falls. Actually, this falls are not real falls -demonstrative videos have been also published, but they can represent the initial step for evaluating the adapted solution problem. Interestingly, this dataset includes multiple sensors; therefore, the researcher can evaluate the approach using sensors placed on different parts of the body.

The thresholds used in this study are exactly the same as those mentioned in the original paper. All the code was implemented in R [21] and caret [18]. The parameters for SMOTE were perc.over set to 300 and perc.under set to 200 -that is, 3 minority class samples are generated per original sample while keeping 2 samples from the majority class. These parameters produces a balanced dataset that moves from a distribution of 47 samples from the minority class and 200 from the majority class to a 188 minority class versus 282 majority class (40%/60% of balance).

Table 1. 10 fold cv results obtained for the NN (up) and C5.0 rule based system (bottom). From left to right, the main statistical measurements are shown: accuracy (Acc), Kappa factor (Kp, sensitivity (Se), the specificity (Sp), the precision (Pr) and the geometric mean of the Acc and Pr, \(G=\root 2 \of {Pr\times Acc}\).

To obtain the parameters for the NN a grid search was performed; the final values were size set to 20, decay set to \(10^{-3}\) and maximum number of iterations 500, the absolute and relative tolerances set to \(4\times 10^{-6}\) and \(10^{-10}\), respectively. In this research, we use the C5.0 implementation of the C4.5 that is included in the R package to obtain the rule based systems. The parameters found optimum for this classification problem are cf set to 0.25, bands set to 2, the fuzzy Threshold parameter set to TRUE, the number of trials set to 15, and winnow set to FALSE.

Both 5x2 cross validation (cv) and 10-fold cv were performed to analyzed the robustness of the solution. The latter cv would allow us to compare with existing solutions, while the former shows the performance of the system with an increase in the number of unseen samples. The results are shown in Tables 1 and 2 for 10-fold cv and 5x2 cv, respectively.

Table 2. 5x2 cv results obtained for the NN (up) and C5.0 rule based system (bottom). From left to right, the main statistical measurements are shown: accuracy (Acc), Kappa factor (Kp, sensitivity (Se), the specificity (Sp), the precision (Pr) and the geometric mean of the Acc and Pr, \(G=\root 2 \of {Pr\times Acc}\).
Fig. 2.
figure 2

5x2 cv Boxplot for the different measurements -Accuracy (Acc), Kappa (Kp), Sensitivity (Se) and Specificity (Sp), Precision (Pr) and the geometric mean of the Acc and Pr, \(G=\root 2 \of {Pr\times Acc}\), both for the feed-forward NN (six boxplots to the left, with the N_ prefix) and C5.0 (six boxplots to the right, with the C_ prefix).

3.1 Discussion on the Results

From the tables it can be seen that both modelling techniques perform exceptionally well once the SMOTE is performed and using test folds from 10-fold cv: the models even perform ideally for several folds. And more importantly, the two models are interchangeable with no apparent loss in the performance. Actually, these results are rather similar to those published in the original work [1]. However, when using 5x2 cv the results diverts from those previously mentioned.

With 5x2 cv, the size of the train and test datasets are of similar number of samples, producing a worse training and, what is more interesting, introduces more variability in the test dataset. Therefore, the results are worse. The point is that these results suggest the task is not solved yet as the number of false alarms increased unexpectedly (Fig. 2).

This problem is important because in this experimentation we used the UMA-Fall dataset [5]. This dataset used was generated with young participants using a very deterministic protocol of activities. The falls were performed with the participants standing still and letting them fall in the forward/backward/lateral direction. Therefore, the differences with real falls might be relevant; even if they are not so different, the variability that might be introduced will severely punish the performance of the obtained models.

4 Conclusions

This study compares the performances of two classification techniques when tackling the problem fall detection with data gathered from accelerometers located on one wrist. The original proposal detected fall events and performed a feature extraction which was classified with a feed-forward NN. A SMOTE stage is included to balance the transformed dataset previous modelling. Two different techniques are compared: the feed-forward NN and C5.0 rule based systems. A publicly available dataset with falls has been used in evaluating the proposal. Interestingly, the two modelling techniques performed similarly, which suggest that in real world applications with the solution embedded in smartwatches perhaps the rule based systems is more likely to be used.

Although exceptional results have been found using 10 fold cv, the 5x2 cv results suggest that still a high number of false alarms is obtained. Although the percentages are better that those reported for commercial devices, some design aspects must be analyzed in depth: the robustness to the variability in the behaviour of the user, or the tuning of the threshold to fit specific populations like the elderly.