Introduction

Falls among elderly citizens constitute a significant and widespread problem in society (Franse et al. 2017). Studies have shown that simple strength training exercises and daily outdoor activities can reduce the incidence rate of falls (Salminen et al. 2009; Klaperski-van Der Wal et al. 2023), however, the current efforts of Danish municipalities have traditionally focused on reactive approaches. To adopt a proactive approach and narrow down the search criteria for eligible adults, machine learning has been adopted to identify people at risk of falling. This is done primarily in hospitalized adults with good results (Lindberg et al. 2020; Patterson et al. 2019; Nakatani et al. 2020), but few studies have focused on community-dwelling elderly. In this setting, most proposed methods are invasive, where researchers, for example, place cameras around participants in their homes, which can be seen as intrusive and a violation of privacy (Dubois et al. 2019; Yang et al. 2019; Wilmink et al. 2020). The few noninvasive studies that exist have only studied binary classification methods (Chelli and Patzold 2019; Dormosh et al. 2022), which cannot distinguish individuals at risk beyond a few classes. They also rely on fall reports to determine if someone has fallen or not, but falls in older adults are often underreported or incorrectly reported (Hoffman et al. 2018).

In this observational study, we propose a noninvasive method to predict individual fall risk using survival analysis. Survival analysis is a form of regression modeling that supports censored data, i.e., where the event of interest is partially observed. Conventional machine learning methods do not support censoring and can thus produce biased predictions if censored observations are ignored (Stepanova and Thomas 2002a). We use personal alarm subscriptions as our event of interest (Hoffman et al. 2018). Our method is based on data from a large Danish municipality, where personal alarms are given to citizens who have been evaluated as being at risk of falling. The municipality participated in a signature project funded by the Danish government on fall prevention with the corresponding university from 2020 to 2022. The data consist of 2542 weekly home care observations for 1499 citizens over a period of one year, which we divide into two cohorts with a 6-month follow-up period. We train five different survival models and establish an evaluation pipeline using cross-validation to access model performance. The goal is to offer a noninvasive decision support tool that can help healthcare professionals select the right candidates for fall prevention programs. In summary, our contributions are the following:

  • We propose a novel and noninvasive predictive model to assess fall risk in home care clients using survival analysis.

  • We use personal alarm subscriptions as the event of interest to avoid the bias and noise that surround fall reports.

  • We demonstrate the effectiveness of our approach using different models and feature selection algorithms.

Related work

Work in fall-risk assessment and prevention using machine learning can generally be split into two categories: sensor data studies, where sensors or motion trackers are placed on or in the vicinity of the individual, and electronic health record (EHR) studies, where stored data about an individual’s historical traits are used. We denote the former as invasive methods, since they require active involvement of the individual to operate, whereas the latter is noninvasive, since a model trained on EHR data can be used unobtrusively and without involving the individual. In this section, we provide a brief overview of the latest research in noninvasive methods and motivate our approach.

Noninvasive methods do not require ubiquitous or pervasive monitoring to work, and rely solely on historical information about the individual. Kuspinar et al. (2019) developed a predictive algorithm to predict fall risk based on home care data from a cohort of more than 80,000 adults between 2002 and 2014; however, the authors only report odds ratios for different groups of fallers without any assessment of predictive performance. In addition, the follow-up period is only 90 days (3 months) and all observed falls are self-reported, which may lead to underreporting or recall bias. The fall assessment is based on data from a Resident Assessment Instrument-Home Care (RAI-HC) evaluation, a standardized assessment scheme, which would have to be performed in person with the patient, making their method not entirely noninvasive. Lo et al. (2019) used a random forests algorithm on home care data extracted from the Outcomes and Assessment Information Set (OASIS) database to predict fall risk, which showed initial promise (ROC scores between 0.66 and 0.68), however, their model can only classify people into two categories (faller or not a faller) and thus cannot predict neither the time to fall nor individual risk scores. Other data sources in addition to home care have also been explored for noninvasive fall assessment, such as clinical notes (Santos et al. 2020; Fu et al. 2022) and fall reports (Dos Santos et al. 2019).

Recently, Dormosh et al. (2022) developed a fall-prediction model using logistic regression based on hospital registration data. The authors found significant correlations between a range of health record features and the risk of falling, but had to rely on cumbersome manual labeling to identify fallers from clinical notes. The main limitation of their work is that older people tend not to report falls, unless medical attention is required (Stevens et al. 2012), which is consistent with the observation that many falls go unreported (Hoffman et al. 2018). Furthermore, reported falls may lack detail or may not be caused by a lack of strength or mobility, but by other factors (e.g., alcoholism). Another limitation of Dormosh et al. (2022) is the use of binary logistic regression, which does not provide information about the time to the next fall, cannot rank individuals by risk, nor exploit any temporal information in the data. Logistic regression also struggles with bad expressiveness (e.g., interactions must be added manually).

Table 1 Key demographics for Denmark and the municipality, where the data came from (2021, Statistics Denmark)

We propose a novel, noninvasive method that can predict fall risk within 6 months based solely on home care data and without involving the individual of interest. Our method has the following novelties: (a) it is based on survival analysis, which can handle the problem of incomplete observations, something not addressed in previous approaches based on machine learning (Patterson et al. 2019; Ye et al. 2020). (b) Our model captures the time-to-event information not used in previous works (Chelli and Patzold 2019; Fu et al. 2022; Dormosh et al. 2022), which rely on binary classification. (c) Instead of falls, we use personal alarm subscriptions as the event of interest, which addresses the issue of biases in fall reports (Stevens et al. 2012; Hoffman et al. 2018).

Materials and methods

Survival analysis

Survival analysis is a form of regression that models the time until some event takes place, which can be partially observed (i.e., censored). It has found important use in many domains, such as healthcare informatics (Zhu et al. 2016; Kim et al. 2019), econometrics (Stepanova and Thomas 2002b), and predictive maintenance (Lillelund et al. 2023). We define a survival problem by a sequence of observations represented as triplets, (\(\varvec{x}_{i}\), \(t_{i}\), \(\delta _{i}\)), where \(\varvec{x}_{i} \in \mathbb {R}^{d}\) is a feature/covariate vector for some observation i, \(t_{i} \in \mathbb {R}\) is the time of censoring or the time of event depending on which occurred first, and \(\delta _{i} \in \{0, 1\}\) is the binary event indicator. If \(\delta _{i} = 0\), then \(t_{i} = c_{i}\), where \(c_{i}\) is the time of censoring, otherwise, if \(\delta _{i} = 1\), then \(t_{i} = e_{i}\), where \(e_{i}\) is the time of event. A survival model can predict the probability that some event occurs at time T later than t, i.e., the survival probability, \(S({t}) = \text {Pr}({T>t}) = 1-\text {Pr}({t\le T})\). To estimate S(t), we use the so-called hazard function:

$$\begin{aligned} h({t}) = \lim _{\Delta t \rightarrow 0} \text {Pr}({t<T\le t+ \Delta t \vert T>t})/\Delta t\text {,} \end{aligned}$$
(1)

which corresponds to the event rate at a point after t, assuming the individual survived past that time (Gareth et al. 2021, Ch. 11). The hazard function is related to the survival function through \(h({t}) = f({t})/S({t})\), where f(t) is the probability density associated with T, \(f({t}) := \lim _{\Delta t \rightarrow 0} \text {Pr}({t<T\le t+\Delta t})/\Delta t \), that is, the instantaneous rate of event at time t. In this regard, h(t) is the probability density of T conditional on \(T>t\), and the functions S(t), h(t), f(t), all correspond to equivalent ways of describing the distribution of T, formalizing, e.g., the intuition that higher values for h(t) correspond to higher event probabilities.

Table 2 Key statistics of the dataset after preprocessing

The Cox proportional hazards (CoxPH) model is a popular regression model for survival analysis. It assumes a conditional individual hazard function of the form \(h({t\vert \varvec{x}_i}) = h_0({t}) \exp ({f({\varvec{\theta },\varvec{x}_i})})\). The risk score is denoted as \(f({\varvec{\theta },\varvec{x}_i})\). In Cox (1972), f is set to a linear function of the covariates, that is, \(f({\varvec{\theta },\varvec{x}_i}) = \varvec{x}_i\varvec{\theta }\), and the maximum likelihood estimator \(\hat{\varvec{\theta }}\) is derived by numerically maximizing the partial Cox log-likelihood.

The dataset

A large Danish municipality has provided the two datasets used in this study from their EHR system. The first dataset consists of 229,850 home care observations over 52 weeks (2021) divided into two cohorts (6 months) and covers home care for 6398 citizens in total. Each observation contains the amount of care delivered in minutes, the type of care and the number of social or health workers who provided the service. The type of care follows a standardized naming scheme across all Danish municipalities, FSIII (English: Common Language III. Danish: Faellessprog III). The second dataset contains a list of citizens who have subscribed to a personal alarm system in 2021. First, we merged the two datasets and removed individuals who had already subscribed to a personal alarm prior. Second, we created a window that spanned 26 weeks (6 months, from \(t=1\) to \(t=26\)) and fixed a starting time for citizens at \(t=1\) in that window. For each observation, we assigned the time of event or censoring given the event indicator \(\delta \in \{0,1\}\); \(\delta =0\) if the citizen had dropped out or not subscribed yet, \(\delta =1\) if the citizen had subscribed. We included only citizens who received at least 100 min of home care per week and were born between 1930 and 1970. The final dataset has 2542 observations for 1499 citizens, a censoring rate of 95% (2416/126) and 49 nonzero covariates. No outlier detection or feature scaling was done. There was no missing values. Columns with only zero values were removed (Tables 1, 2 and 3).

Results

Setup

We implement the plain vanilla Cox model (CoxPH) (Cox 1972), the Cox model with LASSO regularization (CoxPH \(\ell _{1}\)) (Simon et al. 2011), the Cox model with Ridge regularization (CoxPH \(\ell _{2}\)) (Simon et al. 2011), Random Survival Forests (RSF) (Ishwaran et al. 2008) and the Cox model using boosting (CoxBoost) (Hothorn et al. 2005). To evaluate our approach, we report Harrell’s concordance-index (CIH) (Harrell et al. 1996), Uno’s concordance-index (CIU) (Uno et al. 2011), the integrated Brier score (IBS) (Graf et al. 1999), the mean absolute error (MAE) (Qi et al. 2024) using hinge loss, and D-calibration (Haider et al. 2020). See Appendix A for more details on the performance metrics. We train the models using either all covariates or only those selected by a feature selector, e.g., low-variance thresholding (LowVar), SelectKBest (SKB) or Recursive Feature Elimination (RFE). We set \(K=10\), i.e., the desired number of covariates will be 10.

Table 3 Descriptive statistics of covariates ranked in descending order by observational count

To estimate the generalization error of each model and feature selector, we run stratified nested cross-validation with five outer loops and five inner loops. Stratification ensures that the event times and censoring rates are consistent across the training and test sets. Using nested cross-validation, feature selection and hyperparameter optimization are performed together within each fold. Appendix B reports the best obtained hyperparameters by highest CIH.

Table 4 Experimental results from 5-fold cross-validation. For D-calibration, we report the number of D-calibrated mean survival curves across the five folds according to a Pearson’s \(\chi ^2\) test

Model performance

Table 4 shows the predictive and calibration performance from cross-validation, averaged over five folds. The CI (CIH and CIU) measures how well a model can predict risk scores that match the order of events, i.e., people with higher risk should experience the event before people with lower risk. The plain vanilla CoxPH model obtained a CIH of 0.53, which means that it has ranked 53 out of 100 pairs correctly. A CI of 0.5 indicates chance-level ranking. A CoxPH model with a LASSO penalty term obtained a CIH of 0.61, thus reducing the number of covariates leads to better discriminative performance in this case. CoxBoost provided the most concordant risk predictions with a mean CIH of 0.64 (95% CI 0.57–0.72) using low-variance thresholding. We note that multiple types of home care are provided only on a sporadic basis. This means that the dataset has many covariates with predominant zero values, which in turn can make it difficult for the model to identify which covariates are important and which are not. This phenomenon can lead to poor performance on new data. We see that restricting the feature space using LASSO or a feature selection algorithm gives better CI results on average.

On the surface, the results of the integrated Brier score look promising, as a lower number indicates a better estimate of the survival curve. However, the high censoring in this dataset means that for many cases, the predicted survival curve only has a slight decline over the follow-up period. This reflects the cumulative incidence function for the event, but does not offer much valuable application, since the discrepancies in the survival probabilities between the two groups (citizens that receive an alarm and citizens that do not) are few. RSF was best at predicting the survival curve with a mean integrated Brier score of 0.033 (95% CI 0.031–0.035). RSF was also best at predicting the time to event with a mean MAE of 83.5 (95% CI 79.1–87.9) using RFE to select the best covariates. The MAE is the absolute difference between the actual and predicted survival times (e.g., the median of the curve) using a Hinge loss function; if the prediction is higher than the censored time, the loss is not penalized. This means that the error comes from censored samples with an overly pessimistic prediction (less than the censoring time) and uncensored samples with an overly optimistic prediction (more than the event time). With a 95% average censoring rate, the model overshoots the true event time for the few individuals who experience the event; thus, our model is poor at predicting when the event occurs in the sample population. We used Pearson’s \(\chi ^2\) goodness-of-fit test to assess D-calibration, and find that, on average, CoxPH with LASSO, RSF and CoxBoost gave predicted survival curves that were calibrated with respect to the actual survival distribution.

Fig. 1
figure 1

Mean predicted survival functions by the CoxPH, RSF and CoxBoost models, versus the unbiased Kaplan–Meier estimator on the test set. The survival functions are predicted over the entire event horizon from \(t=1\) to \(t=26\)

Fig. 2
figure 2

SHAP summary plot. The covariates are on the y-axis and the SHAP-value is on the x-axis. Covariates are ranked based on their importance, which is given by the mean of their absolute SHAP values (higher positioning means more important). For each covariate, one point corresponds to a single citizen. Its position on the x-axis represents the impact that covariate had on the model’s output, which corresponds to the risk across citizens. The color indicates whether the citizen had a low (blue) or high (red) value of that covariate

Model explainability

After cross-validation, we adopt a 70–30% train-test split, configure the models with sensible default parameters and retrain them for plotting purposes. Figure 1 shows the mean predicted survival function for all samples in the test set by the CoxPH using LASSO, RSF and CoxBoost models. As a reference, we have included the Kaplan–Meier (KM) (Kaplan and Meier 1958) estimator with a 95% confidence interval. In the present study, we assume censoring to be noninformative: consider two citizens with the same risk factors, both yet to experience the event of interest by time t. One individual is lost to follow-up at that time, i.e., their event time at t is right-censored, and the other continues in the study. Under the assumption of noninformative censoring, we assume these two individuals have the same subsequent risk of experiencing the event, and knowing that one of them is censored does not add additional information. Under this assumption, the KM is unbiased, regardless of the proportion of censoring. Similarly, the hazard ratios of a Cox’s proportional hazards model represent a good estimate of relative risk and are unbiased regardless of the amount of censoring. The predicted survival probabilities and the KM align well in this plot. All models fall within the KM’s confidence interval after \(t=8\), however, a slight overestimation of risk is seen between \(t=0\) and \(t=7\).

Fig. 3
figure 3

Left: SHAP dependence plots of the respective covariate. The color indicates whether the covariate has a low (blue) or high (red) value. Right: Partial dependence plot of the same covariate

Figure 2 shows the most important covariates and covariate values according to SHAP (Lundberg and Lee 2017). These values were obtained from a Random Survival Forests model. The covariates are on the y-axis and the SHAP value on the x-axis. The color indicates if the covariate value is low (blue) or high (red) for a particular observation. According to the SHAP values, the most important covariates are: “Personal hygiene”, “SUL §138”, “Excretions”, “RH Nutrition” and “Medication administration”. The SHAP values exhibit a slight positive correlation between risk and the number of minutes received in the category “Personal hygiene”, but a strictly positive correlation for “Excretions” and “SUL §138”. In this context, “SUL §138” refers to a section of the Danish health legislation that requires the municipality to provide an individual with a community nurse by referral from their general practitioner. We see that some covariates have both positive and negative correlations (e.g., “Cleaning”, “Medication administration” and “RH Personal hygiene”). These may seem counterintuitive at first, as we would usually associate increased reliance on home care with an increased risk of falling, but we study purely the correlations between care and the risk of getting a personal alarm. Elderly people who depend on a lot of home care have fewer falls, as they are attended to very often and many have daily check-ins by a nurse or a care practitioner; hence, they have less need for a personal alarm.

Fig. 4
figure 4

Predicted individual survival (left) and hazard (right) functions for 20 randomly selected citizens from the test set using the RSF model. Solid lines represent citizens who do not receive an alarm within 6 months. Dotted lines represent citizens that do

Figure 3 is a SHAP feature dependence plot and partial dependence plot for three of the most important covariates (“Personal hygiene”, “Excretions” and “Cleaning”). These provide more detail on the exact correlations between risk and received care. A partial dependence plot is created by marginalizing the model output over the distribution of the covariates in a set of irrelevant covariates, C, and returning a function that depends only on covariates in a relevant covariate set, S, interactions included. This tells us, for given values of covariates S, what the average marginal effect on the prediction is, i.e., how the outcome changes when a specific independent variable changes. In Fig. 3a on the left, we see a vague, but positive correlation between minutes and SHAP value for the “Personal hygiene” covariate, which is also explained by the partial dependence plot on the right. The vertical gray bars in the plot show the data distribution. In Fig. 3b, we plot the “Excretions” covariate and see a much steeper increase in both SHAP value and estimated risk from 0 to 200 minutes, but it increases only slightly after that. Lastly, Fig. 3c shows that “Cleaning” has a negative trend between 0 and 100 minutes, but rises suddenly from 100 and onward.

Figure 4 shows predicted individual survival and hazard functions for 20 randomly selected citizens using a random survival forests model. Solid lines represent citizens who did not subscribe to a personal alarm within 6 months. The dotted lines represent the citizens who subscribed to one. We see in the upper row that most citizens reach a survival probability (probability of no alarm) at \(t=26\) between 0.98 and 0.90 and a cumulative hazard between 0.01 and 0.10, but some citizens have a significantly lower survival probability and associated cumulative hazard. In this figure, only one citizen in this random draw subscribed to a personal alarm during the study, but the individual has similar predicted survival probabilities over the event horizon as the remaining cohort. This tells us that predicting the time to event from these curves alone is difficult.

Discussion

Applicability

The proposed method represents a change from a reactive mindset to a proactive mindset and is applicable in a municipality setting, where efficient management of resources and healthcare personnel is crucial. By accurately identifying those citizens who are at the highest risk of falling, the municipality can allocate its resources more effectively by providing personalized home care, physical therapy, and other healthcare services. This is important, as the consequences of falls can often be more severe for the elderly due to existing health problems and reduced mobility (Franse et al. 2017). Wang et al. (2023) find that exercise appears to be particularly effective for people with higher fall rates. In their study, exercise was more effective in trials with a higher prospective fall rate (32% reduction in falls and 442 prevented falls in 1000 people over one year). However, trials with a lower prospective fall rate showed a comparatively reduced effectiveness, resulting in a mere 12% reduction in the number of falls and preventing only 64 falls in 1000 people in the same period. At first glance, it appears to be a sound strategy to offer exercise and physical therapy to any older adult over 65 years of age, but a broad preventive approach is very expensive if training is to be supervised, which multiple studies advocate for (Donat and Özcan 2007; Youssef and Shanb 2016). Moreover, we expect that the number needed to treat (NNT), as a measure of the effectiveness of a health intervention, will likely be lower (better) with a more targeted and risk-based approach. When prevention and intervention are focused on the most relevant citizens, the number of people who need to be trained to prevent a single fall is reduced.

Model performance

Concerning the predictive performances, all models obtained concordance index scores between 0.51 and 0.64, indicating that the proposed method can rank individuals by risk statistically better than chance-level ranking (0.5). This was validated by Harrell’s and Uno’s concordance index using cross-validation across five folds. However, the proposed method cannot accurately predict when the next fall would occur for the few citizens who experienced the event within the 6-month cohort given the current dataset and event horizon. This was confirmed by observing a relative high mean absolute error between the actual event time and the predicted event time. We attribute this to the high censoring rate and the lack of qualitative covariates for this specific task. Regarding the calibration performance, the CoxPH model predicted D-calibrated survival curves in all five cross-validation folds when using the LASSO regularization technique, i.e., zeroing out coefficients and thus preventing overfitting. The two ensemble-based and gradient-boosting models predicted D-calibrated survival curves in all folds as well.

Limitations

Our method is trained only on EHR data from a single municipality. Although being one of the largest in Denmark, the correlations we have found between home care usage and personal alarm subscriptions may not translate to other municipalities or government institutions, let alone in Denmark. Although personal alarms are an objective arrangement to avoid future falls, not all personal alarms in Denmark are given to people at risk of falling or who have had falls. They can be subscribed to a person in need of a secure environment or individuals suffering from anxiety. When adopting a machine learning model for fall prediction, it should be trained and evaluated in the population that it is used in. Another limitation is the high censoring rate. Ninety-five percent of the people in this study did not receive a personal alarm during either cohort. We regard this kind of censoring as noninformative, i.e., the presence of censoring contains no information about the actual time to event. Under this assumption, the proposed Cox proportional hazards models are not biased (Leung et al. 1997). However, the high censoring rate means that using the median of the survival function as the predicted time to event often requires a great deal of extrapolation outside the event horizon, and these estimates hereby carry a lot of uncertainty. Without more predictive features (e.g., physical strength tests, previous falls) or a longer study period, our method is not able to predict the time to the next fall accurately.

Ethical concerns

The present study did not include human trials and all data were completely anonymized before any data processing or machine learning took place. A contract was signed between the data owner and Aarhus University to help ensure privacy and confidentiality in the handling, processing, and dissemination stages. The proposed tool is intended as a whitelisting tool that can highlight potential candidates for fall prevention programs. There are no side effects if more people are selected for fall prevention, as strength training for elderly people is always beneficial if it is done correctly and properly supervised (Donat and Özcan 2007; Youssef and Shanb 2016). Today, most of the elderly often experience multiple falls at home before being offered fall prevention, and the socioeconomically advantaged elderly have easier access to health care resources than the disadvantaged elderly (McMaughan et al. 2020). A data-driven model can ensure that risk assessment and eligibility for a fall prevention program are independent of socioeconomic status. In this way, it will contribute to reducing health inequality, as all citizens, regardless of background, will have access to relevant and timely prevention offers.

Conclusions

Based on 2542 home care observations for 1499 citizens (59% female, 41% male) with a mean age of 77 years (SD 10 years), we have trained a selection of machine learning-based survival analysis models to predict the per-individual risk of falling over 6 months. Using 5-fold cross-validation, the best model in terms of ranking performance was CoxBoost with a mean Harrell’s concordance index of 0.64 (95% CI 0.57–0.72). This concordance index indicates better ranking performance than chance-level by 14% on average. The CoxBoost model also produced survival curves that were D-calibrated across all five folds according to a Pearson’s \(\chi ^2\) test. Our method can be used as a decision support tool to choose the right candidates for a fall prevention program and can provide a cost-effective and timely way to assess fall risk among community-dwelling elderly, which can potentially improve their quality of life and reduce the burden on the healthcare system.