Introduction

Most of the risk factors for hip fractures seem to have their effect on fracture risk mediated by low bone mineral density, an increased risk of falling, or both [1]. The most clinically established fracture assessment tool is called FRAX [2] and includes eleven risk factors with the optional addition of bone mineral density (BMD) of the femoral neck as a twelfth risk factor. The traditional measurement of predictive accuracy, the area under curve (AUC) for the receiver operating characteristic curves (ROC), has been determined for FRAX in several different populations, and for postmenopausal women, it varies between 0.7 and 0.73 [1417] for hip fractures. The problem with AUC has been that it fails to show any significant improvement of an existing predictive model when a new predictor is added to the model, no matter how promising the new predictor has seemed to be [3]. To address this problem, new methods for predictive accuracy such as net reclassification index (NRI) have been developed. Some risk factors for falling are risk factors for low BMD as well, e.g. older age, weight loss, low muscle strength, and impaired mobility [4]. However, some risk factors for falls have also shown to be BMD-independent risk factors of hip fractures, e.g. slow gait speed, impaired vision and a history of falls [5]. Thus, although low BMD and high fall risk sometimes overlap, a risk factor for falls may add information about fracture risk not captured by BMD. FRAX does not include any measure of fall risk since such data were either not available or not measured with a consistent method in the cohorts studies on which FRAX was based [6]. There are several validated ways to assess fall risk including a history of previous falls, such as current medication, self-graded fall risk, visual tests, balance tests, and the “timed up-and-go test” [7]. These factors may also be combined in assessment questionnaires such as the St. Thomas Risk Assessment Tool for falls in elderly (STRATIFY) [8] and the Downton fall risk index [9]. Gait speed is the most studied of all gait analyses used for the prediction of fall risk [10]. However, it remains to be determined which measure of fall risk would be the best suited to incorporate into FRAX [4] To be suitable for improvement of the performance of FRAX, a variable should be easy to measure, reproducible, independent of BMD, and continuous with a “dose–response” effect on fractures [11]. It should of cause also still result in a significant increase in risk, after all current risk factors in FRAX have been adjusted for.

Slow gait speed has been shown to predict hip fractures independent of BMD [12, 13]. The effect on fracture risk is probably mediated by an increased fall risk since slow gait speed has been associated with an increased fall risk in other studies [10, 14, 15]. Sarcopenia is a risk factor for hip fractures [16] and means low muscle mass combined with impaired physical performance [17]. A gait speed of <0.8 m/s is one of the criteria for sarcopenia in many of the definitions of this condition [18]. In a Chinese study, a predictive model consisting of sarcopenia and FRAX resulted in a better predictive performance for hip fractures than FRAX alone in men but not in women [19]. Gait speed is usually measured at a usual pace over a 3–6 m long distance [20]. The measurement we used, 15 + 15 m with a 180° turn in between, has also been studied in relation to hip fracture risk [21]. One SD slower gait speed resulted in a HR of 1.37 (1.14–1.64) for a hip fracture in elderly women. A low gait speed has also been associated to indicators of frailty such an unintentional weight loss, self-reported exhaustion, low physical activity, and low hand grip strength [22].A short one-leg standing time (OLST) has been described as a strong risk factor for hip fracture, which is independent of many of the risk factors in FRAX including BMD [23, 24]. OLST has been shown to be a good predictor of falls [25, 26].

We have found no previous studies on the incremental value of adding gait speed or one-leg standing time to FRAX.

The aim of this study was to evaluate the effect on predictive ability of adding either gait speed or one-leg standing time to FRAX.

Methods

Population

A cohort of 351 free-living women, aged 69–79 (mean age 73 years), was tested regarding OLST and all FRAX parameters, including femoral neck BMD, between 1999 and 2001. These women were part of the PRIMOS project (Primary Health Care and Osteoporosis). The relation between OLST and hip fractures in this population has been described earlier [24]. Inclusion criteria for participation in the study were being a woman born between 1920 and 1930 and living in the Bagarmossen area, a suburb of Stockholm in Sweden. Of the 937 eligible women, 584 were sent written invitations to participate in the study and 351 women agreed to participate and were included in the study (Fig. 1) [31]. Of the eligible 937, all 284 women born between 1926 and 1930 were invited. Of the women born between 1920 and 1925, a random sample was invited (Fig. 1). All 937 eligible were not invited because the funding did not allow for so many participants to be investigated. Although it was not a condition for invitation, participants had to be physically capable of transporting themselves from their home to the primary health care centre to be able to participate. All participants were examined by the same physician. A follow-up was conducted in 2010 with information about fractures obtained from Swedish health care registers.

Fig. 1
figure 1

Recruitment flowchart

Biochemical analyses

Vitamin D was analysed with Nichols Advantage® 25-Hydroxyvitamin D assay (Nichols Institute Diagnostics), a chemiluminescence analysis used for measurement of 25-hydroxy-vitamin-D in serum. At the time of the analysis of the samples from this study, the CV% was between 10.4 and 11.5 for 25-hydroxy-vitamin-D levels of 70–80 nmol/l. Vitamin D was successfully analysed in 336 of the 351 participants from samples drawn at the primary health care centre during the first study visit.

Physical functional tests

Gait speed

The participants were asked to walk in a corridor on a floor which was flat and had no inclination, as fast as possible from one mark to a mark 15 m away, turn there and then hurry back to the starting point. The average speed was calculated as 30 m divided by the walking time in seconds. Out of the 351 participants, 350 managed to perform the test. Gait speed measured in this way has been validated by Ekdahl and colleagues [27].

OLST

The participant was standing barefoot on a flat surface in a well-lit room and was asked to stand on one leg for as long as possible. The arms were to be held alongside the body and the eyes to be open. Time was measured with a stopwatch starting when one foot was lifted from the floor and stopped as the elevated foot touched the floor again or the participant touched any part of the room other than the floor, with any other part of the body. Out of all 351 participants, 349 were able to perform the test. OLST has shown a good test–retest reproducibility and inter-rater reliability [2830].

Bone mineral densitometry

Bone mineral density measurements were conducted by the same trained staff both in 1999–2001 and in 2009 using Hologic QDR 4500 DXA equipment (Hologic, Marlborough, MA, USA). Calibration was performed daily with a phantom. The NHANES III reference population was used for the calculation of T-scores.

Definition of variables

Co-variables are defined as stated on the FRAX website: http://www.shef.ac.uk/FRAX/index.jsp.

The variables were recorded at the first study visit except for BMD which was measured within 254 days from the visit in 95% of the participants and after a maximum of 744 days in the remaining 5%.

Height

Height was measured at inclusion with the participant standing with their back against a wall.

Weight

Body weight was measured with one layer of clothes, and no shoes. All participants used the same scales.

Previous fracture

A previous fracture in adult life, self-reported.

Parent fractured hip

A history of a hip fracture in either of the participant’s parents, as reported by the participants.

Current smoking

Smoking tobacco every day. Dose was not taken into account.

Glucocorticoids

Exposed to oral glucocorticoids for >3 months at a dose equivalent to prednisolone of ≥5 mg/day. Self-reported.

Rheumatoid arthritis

Diagnosis of rheumatoid arthritis either reported by the participant or obtained from patient medical files.

Secondary osteoporosis

Any of the following diagnoses either reported by the participant or obtained from the patient medical files: type 1 (insulin-dependent) diabetes, osteogenesis imperfecta in adults, untreated long-standing hyperthyroidism, hypogonadism or premature menopause (<45 years), chronic malnutrition or malabsorption and chronic liver disease.

Alcohol three or more units/day

A daily consumption of ≥24 g of alcohol. Self-reported.

Femoral neck BMD

Measured at the left femur if not replaced by hip implant when the result from the right side was used instead. “Methods” section, see under “Bone mineral densitometry” above.

Fractures and mortality during follow-up

The dates of death and all fracture diagnoses registered in both inpatient and outpatient care were obtained from the Swedish National Board of Health and Welfare. Diagnoses were identified according to the ICD-10 classification system during the period January 1, 1999, to December 31, 2009. A hip fracture was defined by the ICD-10 code S72.* where * means “any number”. A major osteoporotic fracture was defined as any of the codes S32.*, S42.*, S52.* or S72.*. All causes of fracture (any W-code) were included. The list of fracture diagnoses were carefully “cleaned” for each patient so that diagnoses registered at appointments for checking fracture alignment or removal of osteosynthesis materials were not classified as new fractures.

Non-participants

Information about the non-participants was obtained from telephone interviews in 1999, before start of the study. Mortality was obtained on a group level (anonymized) from the Swedish National Board of Health and Welfare in 2010, as for the participants.

Statistical methods

Spearman’s correlation was used to analyse the association between OLST or gait speed and continuous variables. As both OLST and gait speed had skewed distributions, this method was preferred to Pearson’s correlation.

Differences in time between inclusion and fracture, death or end of study were calculated as the hazard ratio (HR) using Cox proportional hazards regression. To evaluate the additive effect of each of gait speed and OLST to FRAX, the FRAX-predicted fracture risk was adjusted. All Cox regression models were tested and satisfied goodness of fit and the proportionality assumption.

Harrell’s C is a measure of predictive accuracy designed for Cox regression, and it is interpreted in the same way as in logistic regression with the area under curve (AUC) for the receiver operating characteristic (ROC).

NRIs were calculated from logistic regression models according to the methods described by Pencina and colleagues [31] with the STATA using the “nri” command and the STATA package described by Sundström and colleagues [32]. NRI is widely used in epidemiological studies within medicine and provides additional information about predictive accuracy to AUC. However, the results must be interpreted with caution, especially for the category-free NRI (cNRI). Since the cNRI is largely dependent of the category cutoff, the cutoffs should be set from clinical decision cutoffs, to have a high clinical relevance [33].

Categorical NRI was calculated in a similar way, but the study population was divided into two groups, one “high-risk group” and one “low-risk group” where high risk was defined as the highest quartile of risks (at or above the 75th percentile). Since the prevalence affects both the sensitivity and the specificity of a test, it could be interesting to calculate the population-weighted cNRI.

In this way, the proportional changes are weighted by the cumulative incidence of the event. This tells us the change in the misclassification rate.

Because no data were missing in the Swedish death and fracture registers, no participant was lost to follow-up.

Alpha was set to 0.05, and all analyses were performed with STATA 14.1 (StataCorp LP, Texas, USA).

Results

During follow-up, 40 of the 351 participants (11.4%) had a hip fracture. The median time of follow-up was 10.1 years (range of 9.2–10.8 years). The median person time at risk was 8.8 years. The mean BMI was 26.7 kg/m2, and mean 25-OH vitamin D was 93.3 nmol/l (Table 1). The Spearman correlation between gait speed and OLST was 0.53, p < 0.001.

Table 1 Baseline characteristics

For gait speed, <0.8 m/s compared to ≥0.8 m/s the age-adjusted HR was 6.89 (2.87–16.51). Adjustment for the hip fracture risk predicted by FRAX (and not age), resulted in a HR of 12.4 (2.8–54.5). The FRAX risk-adjusted HR for a hip fracture as a function of gait speed, with gait speed 0.8 m/s as reference, was significant at all values of gait speed (Fig. 2).

Fig. 2
figure 2

HR for a hip fracture as a function of gait speed (solid curve) with gait speed 0.8 m/s as reference and dashed curves to illustrate the 95% CI

One SD decrease in gait speed resulted in an age-adjusted HR for a hip fracture of 2.16 (1.54–3.05). For major osteoporotic fractures, the HR was 1.33 (1.03–1.72) adjusted for age.

For 1 s shorter OLST adjusted for age, the HR was 1.06 (1.02–1.09). One SD decrease in OLST resulted in an age-adjusted HR for a hip fracture of 1.82 (1.28–2.60). For a major osteoporotic fracture, the age-adjusted HR for one SD decrease in OLST was 1.37 (1.08–1.75).

The area under the receiver operating characteristic curve (ROC) was 0.61 (0.51–0.71) for FRAX hip fracture risk alone and 0.72 (0.64–0.80) for FRAX combined with gait speed. For FRAX combined with OLST, AUC was 0.69 (0.61–0.77) (Fig. 3). If all three of the predictors OLST, gait speed and FRAX were included, AUC was 0.73 (0.64–0.81). Harrell’s C, the equivalent to AUC but adapted to Cox regression, showed 0.60 for FRAX alone, 0.72 for FRAX combined with gait speed, 0.69 for FRAX combined with OLST and 0.73 for all of gait speed, OLST and FRAX together. Category-free net reclassification index (cfNRI) for adding gait speed to FRAX hip fracture risk was 0.66 (p = 0.0001). For addition of OLST to FRAX, cfNRI was 0.48 (p = 0.0041). For addition of OLST to gait speed and FRAX, cfNRI was 0.26 (p = 0.1338).

Fig. 3
figure 3

ROC curves for FRAX, FRAX + gait speed and FRAX + OLST with first hip fracture as the outcome

The 75th percentile of FRAX-predicted hip fracture risks was 15%, which is equal to saying that the highest risk quartile had a risk of ≥15%. To create one high-risk group and one low-risk group, the study population was divided into one group with ≥15% hip fracture risk and another with <15% risk. When gait speed was added to the FRAX model, categorical NRI was 0.24 (p = 0.023) compared to FRAX alone (Table 2). Sensitivity increased from 13 to 47%, and specificity decreased from 93 to 83%. The population-weighted NRI was, however, negative, −0.07. When OLST was added to FRAX, categorical NRI was 0.06 (p = 0.544) (Table 3). If OLST was added to FRAX and gait speed, cNRI was −0.05 (p = 0.340). Analysis of mediation also showed that 65% of the effect of one-leg standing time on hip fracture risk was mediated by gait speed.

Table 2 Reclassification table
Table 3 Reclassification table

Non-participants

The mean age of those who agreed to participate in the study was 73 years, whereas the mean age of the invited women who declined to participate was 74 years. The difference in mean age was significant (p < 0.001). Self-reported health and frequency of physical activity were not significantly different between participants and non-participants. The mortality rate during follow-up for the 937 eligible was 35% compared to 21% in the study sample. This difference was highly significant (p < 0.001).

Discussion

In our cohort of elderly women, both gait speed and OLST improved the predictive ability for hip fractures of FRAX according to both ROC areas and cNRI. However, the 95% confidence intervals for the different ROC areas overlapped and a statistically significant difference in cNRI to FRAX alone was only found for gait speed added to FRAX (Fig. 3). Gait speed also proved to be a strong risk factor independent of all FRAX variables. The population-weighted cNRI for gait speed added to FRAX was negative. The reason for this was that the number of participants without an incident fracture was about eight times greater than the number of patients with an incident fracture. With the addition of gait speed to FRAX, the net change in the number of true negatives (specificity), which was negative, was greater than the net change in the number of true positives (sensitivity), which was positive. In short, the specificity decreased more than the sensitivity increased. Is the addition of gait speed to FRAX of any positive value then? It depends on what you consider more important, sensitivity or specificity. In a screening test aimed at finding high-risk individuals for preventive treatment, this depends a lot on the risk/benefit ratio for your treatment. In the case of possible bisphosphonate treatments to prevent future fractures in a group with ≥15% risk of a fracture, the risk of the main serious adverse event, an atypical femur fracture, would be less than 1/100 of the risk of a “typical” fracture (15% fractures in 10 years = 1500/100,000 person-years compared to 11/100,000 person-years for atypical femur fractures [34]). Since the relative risk of a second non-vertebral fracture is approximately 0.60 with treatment (alendronate) [35], a small increase in the net number of false positives might be outweighed by a net increase in the number of true positives. To be sure not to miss treating someone who really needs it, unnecessary treatment of some patients might be acceptable. The participants were generally well nourished since they had a mean BMI in the overweight range of ≥25 to <30 kg/m2 [36] and mean 25-hydroxy vitamin D concentration was well above the sufficient level above 50 nmol/l [37]. The “one SD scales” were presented for comparison with other studies with other variables such as “one SD decrease in femoral neck BMD”. Note, however, that the distribution was not normal (as it is for BMD) for neither gait speed nor OLST which makes such a comparison less clear-cut.

Our finding of an incremental value of a clinical test associated with fall risk, as an addition to FRAX, is in concordance with the findings in a study published by Edwards and colleagues [38]. They found that the addition of “a history of a fall after age 45” to BMD and FRAX-like clinical risk factors could increase the AUC from 0.782 to 0.802 in a population of elderly women, with a mean age of 66.6. A study by Melton and colleagues found no incremental value of adding either “history of a fall during past year” or “presence of fall risk factors” to FRAX. These negative results might be explained by the fact that this study population was much younger so that the differences in BMD could be more important for the fracture risk than the fall risk. Also, the outcome was not confined to hip fractures but to first major osteoporotic fracture (hip, spine, wrist or humerus), where fall risk could be of less importance than for hip fractures. A recent study by Harvey and colleagues showed that FRAX-estimated fracture risk was also a predictor of incident falls [39]. It was suggested that although FRAX includes no direct measure of fall risk, the included risk factors could still be associated with fall risk. In our study, incident falls during follow-up were not recorded, but the fact that gait speed was a predictor of hip fractures independent of all FRAX risk factors seems to contrast with their findings.

A strength of our study is that it is population-based and includes a high-risk population. Moreover, no participant was lost to follow-up. A limitation is that our study had a fairly small sample population within a limited age span. The participants also had a significantly lower mortality rate than the non-participating eligible women, which reflects that the study group was healthier than the general population. Eligible individuals who could not visit the primary care centre were also not included. Another limitation of our study is therefore that the results may not with confidence be applied to immobilised individuals or less healthy elderly.

Obviously, we could not measure neither OLST nor gait speed in the non-participants. However, it is quite probable that the eligible but not included, who evidently had a higher mortality than the study population, also had shorter OLST and slower gait speed. Both a short OLST and a slow gait speed has been associated to an increased mortality [14, 40]. Since mortality and fragility fractures (e.g. hip fractures) share many common risk factors, it is probable that the eligible but not included also had more fractures than the participants. We therefore see no reason to believe that the results had been significantly different if the entire eligible population had been included in the study. The Swedish National Inpatient Register (IPR), where all hip fractures are registered, has been externally validated and found to have a positive predictive value of 85–95%. We have found no external validation of the register for outpatient care (the OVR database). Therefore, the data regarding hip fractures are the most reliable. Regarding possible unidentified confounding, this is more important if you want to establish a causal relationship. We cannot state that short OLST or slow gait speed causes fractures, only that they are associated. However, for a predictor, an association to the outcome is just as valuable as a causal relationship.

Regarding information bias, since this was a prospective study, any misclassifications at baseline should have been restricted to the classification of exposure. Therefore, any misclassification would probably be non-differential. That is that the errors due to misclassification would have affected the group with the outcome and the group without the outcome equally. Some of the risk factors included in FRAX may be subject to misclassification, e.g. self-reported previous fractures and daily alcohol consumption. The aim of this study is, however, not to validate FRAX why such misclassifications may be of minor importance to the results presented.

The continuous variables OLST and gait speed were analysed after being categorised into more than two categories. Generally, this procedure carries a risk of underestimation of the hazard ratio for the compared groups but it might under some conditions also result in an overestimation of the hazard ratios [41]. It is, however, unlikely that the hazard ratios presented in this study largely overestimates the “true” relation between exposure and outcome. This would require a substantial degree of misclassification [42]. It would also require a similar degree of misclassification in other studies on OLST, gait speed and fractures, with similar results.

In summary, gait speed was a good predictor of hip fractures independent of FRAX. Gait speed also held promise to have the ability to increase the predictive accuracy of FRAX.

The results of our study need to be confirmed in other populations. A new predictive model always needs to be validated in a population other than the model it was derived from. It would also be interesting to study in what way gait speed should best be measured to give the best predictive value.