1 Introduction

Balance is simply defined as the ability to maintain the body's center of gravity on the base of support. Maintaining an upright posture and balanced movement requires complex interplay between the nervous system and muscles. This system gathers information about the body’s position and movement through various senses (proprioception, vision, and vestibular). This sensory data is then processed by the central nervous system, and the processed information is used to send coordinated signals to the muscles, allowing them to adjust posture [1]. Any disruption in the above- mentioned senses or in the muscles that process the response created because of the senses may cause a fall. With aging, sensory acuity and motor skills decrease [1, 2], and falls may occur in the elderly due to aging [3,4,5,6]. Falling in the elderly is a public health problem that usually requires hospital care and may result in the death of the patient in some cases. Death occurs in 20% of patients when a hip fracture has occurred in a falling case [3, 7,8,9]. The probability of encountering these problems increases with increasing age [3, 10, 11]. Apart from the physical problems faced by the patient, some psychological issues such as movement restriction caused by fear of falling again, withdrawal from social life, and depression occur during the postoperative period [9]. It is known that 25 million dollars are spent for the treatment of fall-related injuries per year in the European Union [12], while 50 million dollars are spent in the USA [13]. It is reported that with increased life expectancy, the elderly population will increase exponentially in the next 10 to 20 years [7, 9, 10, 12]. With the enhanced population of the elderly in the world, falling-related problems and expenses will increase [8, 9, 12, 13]. Prevention of falls in the elderly, whose physical, psychological, and financial effects are quite serious, is very important for both society and individuals [9, 14,15,16]. In order to prevent falls, the American Geriatrics Society and the British Geriatrics Society recommend that all adults over the age of 65 be screened annually for a history of falls or balance disorders. In this assessment, according to the American Geriatrics Society/British Geriatrics Society (AGS/BGS) clinical practice guideline, answers to questions about falls ("Fell in last year? Feels unsteady when standing or walking?, Worries about falling?") are obtained. According to the answer, the elderly person's gait, strength, and balance are evaluated using TUG test and/or 30-Second Chair Stand and 4-Stage Balance test. Depending on the test result and the number of falls, the need for further evaluation is determined or the necessary recommendations are made [15, 17, 18]. There are certain problems such as those questionnaires and simple physiological tests used for fall risk assessment are subjective and qualitative, rather than being a precise and objective method [19,20,21], while TUG is the most widely used test, despite its low level of prediction validity [22,23,24]. Computer-controlled advanced devices using multiple sensors are difficult to use in primary care services due to disadvantages such as space, a requirement for a specialist, test duration, and cost [20, 25]. Accordingly, effective fall risk screening is still not routinely integrated into clinical practice [26]. Therefore, it has been concluded that an inexpensive, easy-to-use, objective and quantitative method is necessary for the early detection and prevention of falls in the elderly for both the elderly to live more comfortably and to reduce the cost of social insurance [2, 20, 27].

We can classify the studies performed for fall prediction in terms of the type of sensor used, the number of sensors, and the physical activity performed during recording. Sensors such as accelerometers, force-sensitive platforms, pressure-sensing insoles, and depth cameras have been used for the early detection of fall incidents [6, 26, 28]. Accelerometers have recently become widespread since they are easy to use and financially advantageous in assessing the risk of falling [2, 3, 20, 29,30,31]. Accelerometers are placed in different positions such as the head, upper back, breastbone, shoulder, elbow, wrist, hip, waist, thigh, knee, ankle, and foot. The most used position among these is the waist [3, 20, 25]. The TUG test, the Sit to Stand test, the Alternate Step Test, the Standing Postural Sway, and walking for a certain distance or duration are among the exercises that are carried out [22,23,24].

Chen et al. obtained the time–frequency definitions of the acceleration signals recorded from the accelerometer placed at vertebrae L3–L5 during the TUG test using wavelet transformation, and they obtained 94.1% accuracy by using the time–frequency definitions in the training and testing of the Stacked Autoencoder Network Architecture [32]. Qui et al. extracted gait-based features from acceleration signals recorded from five different acceleration sensors during five different tests. Later, they classified these features using support vector machines (SVM) and achieved a classification accuracy of 89.4% [33]. Rivolta et al. used acceleration signals recorded from an accelerometer placed on the chest during the Tineeti test and achieved 89% accuracy with ANN [7]. Gietzelt et al. extracted the features obtained from one-week walking signals of elderly people with dementia and classified them using a decision trees algorithm and achieved 88.5% accuracy [34]. Grene et al. extracted the features obtained from the signals recorded from five different acceleration sensors during three different activities and achieved an accuracy of 87.58% for men and 78.11% for women using SVM [35]. Howcroft et al. achieved 84% accuracy by classifying the features obtained from the signals recorded from pressure-sensing insoles and head, pelvis, and left shank accelerometers with a multi-layer perceptron neural network [28]. Yu et al. evaluated the features obtained from the acceleration signals recorded during the TUG test and some demographic information using the short form Berg balance scale and achieved 84% accuracy [11]. Apart from the best results we summarized above, similar studies for multisensor or activity with accuracy varying between 84 and 57% can be seen in the literature [5, 12, 30, 36,37,38,39,40].

The highest accuracy is 84% in those evaluating accelerometer data recorded from an accelerometer during walking at a normal pace [12, 24, 28, 37, 41, 42]. Howcroft et al. achieved 84% accuracy and 66.7% sensitivity when they classified the features obtained from the signals recorded from an accelerometer placed on the head with SVM [28]. Busiret et al. divided the accelerometer and gyrometer signals recorded during a 6-min walk into 10-s windows and used them as input in a CNN-based deep learning model. They achieved 81% accuracy and 62.5% sensitivity in the model they created [12]. In another study published by Howcroft et al. in 2017, they classified the features obtained from acceleration signals recorded over the pelvis using artificial neural networks and obtained 77.8% and 71.4% accuracy and sensitivity, respectively [37]. The common point of the above three studies is their low sensitivity. The low sensitivity indicates that these three studies had difficulty finding fallers. In their study conducted with only female participants, Hua et al. obtained an accuracy of 78.9% and a sensitivity value of 87.7% when they classified gait-based features obtained from acceleration signals using the random forest algorithm [41]. Although it has the best sensitivity value among studies conducted with one sensor and one activity, the fact that the participants were only women limits the generalizability of the study. In addition, in Greene’s study [35], which gave separate results for men and women, it was seen that female participants showed higher sensitivity. It may not be correct to compare Hua’s single-sex study with mixed-sex studies. Nait et al. evaluated raw acceleration signals using different deep learning and multitask learning algorithms. For this purpose, they divided the acceleration signals into 10-s intervals and made sample-based and subject-based classifications. The study results were evaluated with receiver operating characteristic (ROC) curves, and 0.75 AUC was obtained in subject-based classification [24]. In both Nait et al. and Busiret et al. studies [12, 24], raw acceleration signals were directly used as input in a deep learning-based algorithm, but the desired high success was not achieved in either study. Ihlen et al. attempted to detect fallers using a partial least squares regression model using a combination of phase-dependent generalized multiscale entropy, gait, and demographic-based features [42]. They achieved 83% accuracy and 83% sensitivity, which is the best result among single-sensor, single-activity studies. The difference between Ihlen’s study and other studies is that it uses features obtained in different ways in addition to standard gait analysis-based features. It can be seen that determining the correct features, which is a point we focus on in our study, can increase the classification accuracy. For this purpose, in our study, unlike existing studies, in addition to basic gait attributes, correlation covariance-based, time domain, frequency domain, and related statistical features were determined to create a wide feature pool, and the most effective ones were selected among them.

As can be understood from the literature, most studies have focused on solutions such as performing different activities, placing more sensors on the body, or placing an accelerometer in multiple parts of the body to obtain better results. However, this solution is not applicable quickly and easily due to a large number of sensor groups and the difficulty of performing different activities. In the studies on one sensor and one activity in the literature, sufficient accuracy and sensitivity could not be achieved because a limited number of features were used. In this study, we aimed to develop a model that can distinguish the elderly with a high probability of falling by using acceleration signals recorded during a walking activity at a normal pace for a short distance based on the accelerometer placed on the waist.

2 Materials and methods

2.1 Database

In this study, the long-term movement monitoring (LTMM) database in PhysioNet was used [43, 44]. The LTMM database consists of three-day and one-minute acceleration signals recorded from 71 elderly people for fall risk, balance, and gait studies. The average age of the participants was 78.36 ± 4.71, ranging between 65 and 87. Three-day recordings in the LTMM database was taken through participants' own use of the accelerometer belt in a home environment. The one-minute recording in the LTMM database was taken during the participants' one-minute walk in a straight line at their own chosen pace, wearing an accelerometer belt in a laboratory environment. Depending on participants' reports about their previous falls, Weiss et al. classified 38 people as non-fallers and 35 as fallers [44]. If the elderly had fallen twice in the past 12 months, they were included in the faller group, and the others were included in the non-fallers group. Detailed information about the recording device and recording protocol can be found in [44]. This study aims to develop a simple method for use in primary care services. Therefore, it has been concluded that one-minute recordings in the database will be more accurate for the purposes of the current study in terms of obtaining them in a short time. Thus, one-minute accelerometer recordings in the LTMM database were used in the current study.

2.2 Preprocessing

All three-accelerometer axes (Vertical (V), Mediolateral (ML), and Anterior–Posterior (AP)) signals have been first filtered with a 3-degree median filter to reduce the effect of sudden changes, and then the gravitational acceleration component has been removed to obtain body acceleration [45]. Then two different filtering processes were applied. First, obtained body acceleration signals are passed through a 0.5 Hz-3 Hz bandpass filter to determine step periods [46]. Second, since 99% of the walking signal energy is below the 15 Hz band, the body acceleration signals are passed through a 0.5 Hz-15 Hz bandpass filter for use in analyses [47]. Both signals have then been normalized to the maximum. A sample of 0.5 Hz-15 Hz bandpass filtered and normalized acceleration signals of the three axes can be seen in Fig. 1. A flow diagram of the methods used in processing acceleration signals is shown in Fig. 2. These processes have also been applied to the acceleration signals of the three axes except calculation of step period, correlation, and covariance blocks.

Fig. 1
figure 1

A sample of 3 axes pre-processed acceleration signal

Fig. 2
figure 2

Workflow of preprocessing, feature extraction, feature selection and classification of this study

2.3 Acceleration gait measures

In gait analysis, measurements from the acceleration signal are generally performed in five categories. These include gait cycle event timings; statistical features; signal frequency features; time–frequency features; and information-theoretic features [48]. Gait cycle event timing, statistical features, and frequency spectrum features were used in this study. Three related to basic gait measurement (average step duration, average stride duration, cadence) were calculated for only the V accelerometer axes, six on the inter-acceleration axes correlation and covariance were calculated between V and ML axes, AP and ML axes, V and AP axes for both correlation and covariance, 21 time domain and 32 frequency domain features were calculated separately for each acceleration axis (Fig. 2). As a result, a total of 168 features belonging to a person were obtained, including the first nine features calculated only once for a person and, 53 features calculated separately for the three acceleration axes recorded from a person. The statistical features have been calculated for both the acceleration signal time series and the acceleration signal frequency spectrum. The feature pool has been expanded through distinguishing measurement of certain statistical features for the first and the last half of steps for time series, distinguishing measurement of certain statistical features for different bands for frequency spectrum, and measurement of features such as Centroid, Spread, Decrease, Flatness and Crest used in audio signals for the frequency spectrum. While calculating the time-domain features, the calculation has been performed for each step of each participant, and the average has been taken. Local maximums of the vertical axis acceleration signal were used to determine the steps [5, 49]. The power spectral density (PSD) of acceleration signals was estimated using the periodogram method with a Tukey window of cosine fraction of 0.5. A detailed description of the measures is provided in the supplementary material.

2.4 Feature selection

ReliefF is a heuristic estimator of feature quality effectively used for datasets consisting of conditionally dependent and independent features, in addition to noisy features [50]. This algorithm functions by assigning a reward to features that give different values to nearest neighbors (K) of different classes, while assigning a punitive to features that give different values to nearest neighbors (K) of the same class. Accordingly, it calculates a weight for each feature and determines how much that feature will help to distinguish the class [51]. The ReliefF was used in the current study since it has been previously found to be the feature selection algorithm that provides the best sensitivity [52]. By changing the number of nearest neighbors from K = 1 to K = 40, the ReliefF algorithm has been run 40 times, and 40 differently weighted feature sets have been obtained. The number of features to be used after weighting has been determined experimentally. For this purpose, different feature sets are created by increasing the number of features one by one, from 5 to 40 features with the highest weight. For each obtained feature set, the classification process is repeated to find the feature set with the highest accuracy [52].

2.5 Classification

Multilayer perceptron (MLP) structures are considered an important class of artificial neural networks (ANN). MLP consists of an input layer, a hidden layers, and an output layer, and is trained with the back-propagation algorithm [53, 54]. The current study uses an MLP trained with a back-propagation algorithm to classify fallers and non-fallers. The MLP structure and training have been carried out with Orange Data Mining, which is a machine-learning software package used extensively in data mining for scientific studies [55]. Rectified linear unit function (RELU) as the activation function of the hidden and output layers and stochastic gradient-based optimizer as the solver for the weight optimization were chosen in Orange Data Mining. The regularization term (alpha) is 0.0001 and the maximum number of iterations is 200. Other parameters have been set to default based on the Scikit-learn module [56]. The number of neurons in the input layer is provided by the number of selected input features. The output layer consists of one layer indicating the fallers and non-fallers. The number of neurons in hidden layers has been changed, starting from 10 by 10 increments up to 200, and the best accuracy has been aimed to be achieved.

3 Results

A total of 168 features have been calculated for each record, including three for the gait cycle; 6 for the inter-axis correlation and covariance between axes, and 21 time and 32 frequency domain features calculated separately for the three axes. When the feature selection process is applied using the Relieff algorithm, 82.2% accuracy (82.9% sensitivity and 81.6% specificity) is obtained as the best classification result with 17 of the most important weighted features for the nearest neighbor number K = 20. The best classification result was obtained using 50 neurons in hidden layers in the ANN structure. Close classification accuracies have also been obtained from different weighted numbers of features for the nearest neighbor numbers K = 20, 21, and 23 (Table 2). However, since the number of features used to obtain these results is greater than 17 and the sensitivity is lower, it has been decided that the best classification result belongs to the 17 features obtained by searching for K = 20 nearest neighbors. The names of the features for which the highest accuracy has been obtained, and the average and standard deviation of the features of the groups are given in Table 1. The classification results obtained with different numbers of features and nearest neighbors are given in Table 2.

Table 1 Names of the 17 features that give the highest accuracy, the domain and the acceleration axis from which the features are obtained, and the mean and standard deviation of the features for each group
Table 2 Classification results for different numbers of features and nearest neighbor

4 Discussion

The main purpose of this study is to perform early detection of falling, which is easy to use, inexpensive, does not require specialists, and does not disturb the patient. Therefore, for this purpose, it should be paid attention to use a small number of inexpensive sensors, place the sensor so as not to disturb the person, and select a short-term and easily applicable activity. In this context, when the literature is examined, previous studies can be classified according to the combination of sensors, the body part where the sensors are placed during recording, and the activity performed during recording.

Sensors such as accelerometers, force-sensitive platforms, pressure-sensing insoles, and depth cameras have been used in the early detection of falls [6, 26, 28]. In the literature, analyses have been performed based on the exclusive use of these sensors, the combination of different sensors, or the combination of the same sensor placed in different parts of the body [6]. In our study, acceleration signals obtained from accelerometers, which are the cheapest and easiest to use, have been used.

Regions such as the head, upper back, breastbone, shoulder, elbow, wrist, hip, waist, thigh, knee, ankle, and foot are preferred to place the sensors [3, 25]. Of these regions, the waist (the center of the body at the lower back), which has also been used in the current study, has been used in 65% of the studies. It is the most suitable region since it is the center of gravity of the body and allows the sensor to be placed with a basic belt without disturbing the patient [20]. In addition, in a study that has compared multiple sensors and multiple locations, it is seen that the waist is the region that provides the highest accuracy among single-sensor evaluations [37]. Therefore, it is considered that this layout is suitable for our study, which aims at simple use.

After the sensors are placed on the body, activities such as walking for a certain distance or duration, the TUG test, the Sit to Stand test, the Alternate Step Test, and the Standing Postural Sway are performed [20, 26, 57]. Walking at a normal pace is the most used activity in fall prevention studies [20, 29]. It is also considered the most appropriate activity in terms of being applicable in primary care services. Since the data set used in our study consists of the data collected during a 1-min walk at a normal pace, it provides an easy usage purpose and can be easily applied.

To our knowledge, in the literature, the highest accuracy is 94.1% among the studies conducted on the detection of falls in advance. This result has been obtained with deep learning classification of the time–frequency analysis images of the anteroposterior axis acceleration signal recorded during the TUG test of the elderly classified as fallers and non-fallers according to the TUG test score [32]. Similarly, 89% accuracy has been obtained by using the features obtained from the acceleration signals recorded during the Tinetti test from the elderly, who are divided into those with a high risk of falling according to the total Tinetti score and those who are not [7]. However, there is no acceptable predictive validity for any of the tests such as the TUG and Tinetti tests, which are recommended to be used exclusively in determining the fallers and non-fallers [22,23,24]. Accordingly, studies in which the data are grouped by the pre and/or post study fall history will be more subjective in determining the fallers. To our knowledge, the highest accuracy is 89.7% among the studies which determine the fallers and non-fallers according to the fall story. In this study, the feature set obtained from accelerations of the transition from sitting to standing was selected from the acceleration signal recorded in a three-day home environment, and the features obtained from laboratory functional measures have been used [38]. The closest accuracy to the above result is 89.4%, obtained by classifying 38 features from the acceleration signals recorded from 5 different sensors for seven different activities. However, it is stated in the study that the time taken for applying seven different activities with five sensors is 30 min [33]. In another study using five acceleration sensors, a pressure sensor, and a balance board, the attributes obtained from the acceleration signals recorded during the TUG test, five-times-sit-to-stand test, and balance evaluation have been classified. In this study, 87.58% accuracy was obtained for males and 78.11% for females [35]. In the literature, except for the above studies, those with an accuracy starting from 84% [28] and lower may be found for multiple sensors or multiple activities. However, the general problem related to multiple sensors and multiple activity studies is the length of data recording time and the need for excessive equipment.

The number of single-sensor single-activity studies is quite limited. Table 3 shows the studies performed with a single sensor and a single activity. Some of the studies were conducted with multiple sensors and multiple activities. The results of these studies with a single sensor and a single activity that achieved high accuracy have been taken and included in Table 3 [12, 28, 37]. In Table 3, two accuracies, two sensitivities, and two specificities are presented for the study given in the second line. One is the results obtained when both single and multiple falls have been included as the fallers in the data set, and the other is the results obtained when only multiple falls have been included as the fallers. Higher accuracy was obtained in the classification study in which multiple fallers are grouped as fallers [42]. When we compare this with the current study, it is seen that there are two studies with higher accuracy than ours with a slight difference. [28, 42]. Although one of the studies had an accuracy of 84%, its low sensitivity draws attention. Therefore, it can be concluded that it didn't correctly identify fallers compared to the current study [28]. In other study, with an accuracy rate of 83%, the classification process was performed based on walk records obtained from a one-week home recording, which had been first divided into epochs of 30 s, and then visually checked and classified [42]. In general, although there is no definite superiority among single-sensor single-activity studies, Ihlen's and the current study stand out in terms of accuracy and sensitivity.

Table 3 Summary of the studies conducted with a single sensor and single activity

As a result of our study, our feature combination, which provided the best classification accuracy, consisted of five time-domain and 12 frequency-domain features. Of the features, 2 are obtained from the AP axis acceleration; 5 are obtained from the ML axis acceleration; 8 are obtained from V axis acceleration, and 2 are covariance and correlation between ML and AP axes accelerations.

Two of the time-domain features are vertical axis step and stride regularity, which are standard gait analysis features. In previous studies, it is seen that these two features have been used to distinguish classes [7] or to reveal the statistical difference between the groups [58]. Similar to the study conducted by [41], the correlation between the ML and AP axes acceleration is included in the features set that distinguish classes with the highest accuracy. Two time-domain features that we have not encountered before in the feature pool that separates the classes in the literature are included in our study. One is the covariance between the ML and AP axes acceleration, and the other is the vertical axis MeanTrend. The fact that the correlation between the ML and AP axes acceleration has been included in the feature pool both in the current study and in the literature has revealed its effectiveness in distinguishing groups similar to covariance, which is inversely related to correlation. The mean trend is a feature that is used in activity determination with accelerometers, and it provides effective results [59]. In this study, it is seen that the vertical axis MeanTrend is included in the feature combination that gives high accuracy in separating the fallers and non-fallers. Since the vertical axis acceleration signal is more variable in falling patients, [60], we conclude that the mean trend reflects the vertical axis variability.

In the current study, the frequency domain features included in the feature pool that provide the best classification accuracy are "wF2_PSD and crest_PSD" obtained from AP axis acceleration, "Kurt_PSD, Med_PSD, Mean_LH_PSD, flatness_PSD and skew_PSD" obtained from ML axis acceleration and “mean_LH_PSD, med_PSD, pF2, AmpF2 and flatness_PSD" obtained from V axis acceleration. Of these features, the standard frequency domain features such as the amplitude of peak frequency [44, 60], the width and relative prominence of the peaks [44, 60, 61], and the mean and median of the spectrum [62, 63] have been used in previous studies. The wF2_AP, mean_LH_PSD_ML, med_PSD_ML, AmpF2_V, pF2_V mean_LH_PSD_V, and med_PSD_V features, included in the feature pool in the current study, have been used in previous studies, and are reported to reveal a statistically significant difference between those the fallers and non-fallers. However, it has been concluded that these features are not used in classification-based studies. Other features that we obtain from the frequency domain (crest_PSD_AP, kurt_PSD_ML, skew_PSD_ML, flatness_PSD_ML, flatness_PSD_V), on the other hand, are not used both in studies examining whether there is a statistically significant difference between features and in those classifying groups through features. A higher risk for falls is associated with walking slower, less regular, less symmetric, and less stable, in addition to walking more variable and less smooth in VT and AP, while it is less variable, smooth, and predictable in ML [44, 60]. These differences between the gait of fallers and non-fallers are naturally reflected in the acceleration signal and acceleration signal frequency spectrum. Therefore, we believe that the spectral features we have proposed better reveal the changes in the spectrum.

Determining the risk of falling in the elderly in advance is a rather complicated classification problem [52]. Additionally, if the data collection is performed in a laboratory environment under direct observation, the difficulty of the classification increases furthermore, particularly in short-term recordings, since the person will walk more attentively and will not behave naturally [3]. It is known that the classification ability of gait measures obtained from accelerometers is higher compared to traditional clinical evaluations [28], and there is no relationship between gait measures and the parameters used in clinical evaluations [3]. Besides, it is unclear which feature or group of features is more predictive of a person's gait assessment [41]. Therefore, obtaining as many features as possible from acceleration signals, apart from conventional clinical parameters, may facilitate the solution to this difficult classification problem. In our study, new features not previously included in the literature have been identified, and it has been shown that they can be used to predict falls in advance. However, the result we obtained needs to be improved to reach the worldwide public population. This study has proven that success can be achieved at the multi-sensor or activity level by using the right methods, and, that a simple model can be developed with one sensor and one activity. To obtain more accurate results in future studies, experiments such as including new features, especially from the time–frequency definition, weighting the obtained features with different feature selection algorithms, and using different classification methods applicable on mobile phones are necessary.

5 Conclusion

The main purpose of this study is to develop a simply applicable early fall diagnosis system based on a single sensor and a single activity. Although there are many studies in the literature conducted on the early diagnosis of falls in the elderly, a limited number are focused on such objectives. Other studies have focused on the solution through multi-sensor and/or recording during multiple activities. However, it is known that the results will not be effective in terms of clinical applicability, patient compatibility, application duration, and cost. It is clear that the problem will be solved more effectively with a single sensor and a single, simple activity. However, a single sensor and a simple activity make it difficult to solve the problem. The partially high accuracy obtained in our study shows that falling can be detected early with a sensor and a simple activity by determining the right features that reveal the structure and change of gait and can be easily applied in the assessment of the elderly during routine follow-ups in clinics. In addition, because the computational load of the features and classification model proposed in the study is applicable on mobile phones, the developed model can be integrated into mobile phone application and made available to crowds worldwide.