INTRODUCTION

One in two postmenopausal women will experience a fracture in their remaining lifetimes.1 Because fractures can result in prolonged disability, loss of independence, and death, predicting the risk of fracture before it occurs is of great importance and in the scope of general internal medicine care.1 The Fracture Risk Assessment Tool (FRAX), recommended by current clinical osteoporosis screening guidelines, is a Web-based fracture risk calculator that estimates the 10-year absolute probability of hip and major osteoporotic fracture (MOF) (hip, spine, wrist, and proximal humerus) in adults aged 40 through 90 years. FRAX includes the following risk factors: age, sex, weight, height, previous fracture (yes/no), parental hip fracture, current smoking, glucocorticoid use (prednisolone ≥ 5 mg/day for ≥ 3 months), rheumatoid arthritis, alcohol intake.2 The FRAX tool can be used with or without bone mineral density (BMD) information.

For women aged ≥ 65 years, the United States Preventive Services Task Force (USPSTF) screening guideline `BMD testing. However, for postmenopausal women aged 50–64 years, screening is based on FRAX risk.1 Specifically, the USPSTF recommends bone density testing for women aged 50–64 years who have a ≥ 9.3% 10-year-predicted probability of MOF using FRAX (without BMD information).1 The USPSTF chose a FRAX threshold of ≥ 9.3% for women aged 50–64 years because it corresponds to the estimated 10-year probability of major osteoporotic fracture of a 65-year-old white female who has no other osteoporosis risk factors.1 However, the USPSTF recommendation was not based on studies that evaluated the performance of FRAX for osteoporosis screening. In our prior work in the Women’s Health Initiative (WHI) Study in women aged 50–64 years, the sensitivity of the USPSTF FRAX-based strategy for identifying incident MOF was low, ranging from 4.7% (women aged 50–54 years) to 37.3% (women aged 60–64 years).3

Given the low sensitivities for the USPSTF-recommended screening approach for FRAX in women aged 50–64 years, it is important to examine the performance of alternative screening tools. The Garvan fracture risk calculator (http://garvan.org.au/promotions/bone-fracture-risk/calculator/) includes sex, age, prior fracture since age 50 excluding major trauma (0, 1, 2, 3, or more), falls in last 12 months (0, 1, 2, 3, or more), and weight (kg) to provide the predicted absolute risk of hip fracture and any osteoporotic fracture over 5- and 10-year time horizons.4,5,6 Therefore, compared with the FRAX risk calculator, the Garvan risk calculator incorporates fewer risk factors, but includes falls, and counts the number of previous fractures. Unlike FRAX, the Garvan tool does not predict MOF risk. The Garvan risk calculator, developed in Australia, has never been externally validated in US women, but is commonly used in Australia and is recommended by the guidelines of the Royal Australian College of General Practitioners. Also, no published prospective studies have compared observed vs. predicted 10-year hip fracture risk for the FRAX and Garvan risk tools in postmenopausal women in the USA. A comparison of FRAX and Garvan hip fracture risk prediction is important to inform screening strategies.

Our primary objective was to evaluate and compare the discrimination and calibration of the FRAX and Garvan hip fracture risk prediction tools in postmenopausal women aged 50–64 years at baseline during 10 years of follow-up. Secondary objectives were to assess and compare the discrimination of the tools in the prediction of MOF and any clinical fracture.

METHODS

Design of the Women’s Health Initiative Study

The Women’s Health Initiative Observational Study (WHI-OS) and Clinical Trials (WHI-CT), conducted at 40 US clinical centers (1993–2005), enrolled postmenopausal women aged 50 to 79 years at baseline (1993–1998). The WHI-CT evaluated three interventions: a low-fat eating pattern, menopausal hormone therapy, and calcium plus vitamin D supplementation.7 The study design was previously described.7 Inclusion criteria included freedom from serious medical conditions.

For the current study, we analyzed longitudinal data from baseline to year 10 of follow-up from all WHI-OS and WHI-CT participants who were aged 50–64 years at baseline (n = 90,764). We excluded data from participants who did not provide information regarding medication use at baseline, and from those who reported taking any medications known to influence osteoporosis at baseline: bisphosphonates, calcitonin, parathyroid hormone, selective estrogen receptor modulators, luteinizing hormone-releasing hormone agents, and somatostatin agents (n = 1111). We excluded data from participants who contributed incomplete information regarding osteoporosis risk factors (n = 5714, 6.4%) or less than 10 years of follow-up time before death or loss to follow-up (n = 19,200, 30.6%). Therefore, the analytic sample for this study consisted of 62,723 participants for hip fracture analyses, 63,621 for major osteoporotic fracture analyses, and 64,739 for clinical fracture analyses (Appendix Fig. 1). The analyses were performed using FRAX and Garvan calculators without BMD information.

Human subjects review committees at each participating institution approved the study. Each participant provided written informed consent.

Outcomes: Incident Fractures

Incident fractures were self-reported annually (WHI-OS) or semi-annually (WHI-CT) using questionnaires.

All hip fractures were confirmed by physician adjudicators using medical records. Other types of fractures were self-reported. In a medical record-based validation study, the average confirmation rate for all single-site self-reported fractures was 71%.8

MOF was defined as a first hip, lower arm/wrist, clinical spine, or upper arm fracture.2, 9 We defined clinical fracture as fracture of the hip, vertebrae (clinical), wrist or forearm, hand, foot, humerus, scapula, clavicle, distal femur, proximal tibia, patella, pelvis, and sternum. Fractures of the fingers, toes, and skull were excluded.

Main Predictors: FRAX- and Garvan-Predicted Absolute Fracture Risk

The FRAX-predicted 10-year risks of hip fracture and of MOF were calculated by the World Health Organization Collaborating Centre for Metabolic Bone Disease (US FRAX version 3.0 without BMD), as described previously.3 For the Garvan risk tool, the 5-year- and 10-year-predicted risks of hip fracture and any fragility fracture (without BMD) were calculated for each participant using published formulas.4

Other Measurements

Information regarding age, race/ethnicity, medical history (previous fractures, rheumatoid arthritis, number of falls in previous 12 months), medication use, parental hip fracture, smoking, alcohol intake, and use of supplemental calcium and vitamin D was assessed using baseline questionnaires. Weight and height were measured at baseline using standardized protocols.

Statistical Analysis

We determined the sensitivity, specificity, and AUC of the FRAX and Garvan models (without BMD information) in discriminating between participants who did and did not experience a hip fracture during (1) a 5-year time horizon for Garvan and (2) a 10-year time horizon for both FRAX and Garvan. AUCs plot the true positive rate (sensitivity) as a function of the false positive rate (100 minus specificity); they indicate how well a risk score can distinguish between diseased and non-diseased persons. AUC values range from 0.50 to 1.0; higher values indicate better discrimination. An AUC value of 0.50 indicates that the risk assessment tool is no better than chance in distinguishing between women who do and do not experience fracture during 10-year follow-up. AUC values were generated using logistic regression.

We calculated the risk score thresholds that would correspond to sensitivities of ≥ 80% for detection of (observed) incident hip fracture over 10 years, with associated specificities, positive predictive value (PPV), and AUC values. For each risk score, Youden’s J statistic was calculated to identify the cut-point to maximize AUC values for identification of hip fracture (AUC(c)). We identified within age strata the Garvan and FRAX risk thresholds corresponding to AUC(c) values for identifying hip fracture.Because the FRAX and Garvan tools both provide an estimate of hip fracture probability, our primary analysis focused on prediction of hip fracture risk. In secondary analyses, we assessed discrimination for predicting the 10-year risk of MOF and 10-year risk of any clinical fracture. Because the Garvan risk tool does not provide an estimate of any clinical fracture or MOF, we used the Garvan-predicted probability of “any fragility fracture” when we assessed the discrimination of the Garvan risk tool for prediction of any observed clinical fracture or MOF. Because the FRAX risk tool does not provide an estimate of any clinical fracture, we used the FRAX-predicted probability of MOF when we assessed the discrimination of the FRAX risk tool for prediction of any clinical fracture. In additional secondary analyses designated a priori, we stratified our results according to the use of hormone therapy at baseline (yes vs. no) and race/ethnicity (white, African American, Hispanic, other).

In sensitivity analyses, we excluded data from participants who initiated osteoporosis medications any time during follow-up: bisphosphonates, calcitonin, parathyroid hormone, selective estrogen receptor modulators, luteinizing hormone-releasing hormone agents, and somatostatin agents. Finally, we repeated the main analyses after excluding participants (n = 29,866) who used estrogen therapy any time during study follow-up.

To assess calibration, we compared observed vs. predicted 10-year probability of hip fracture stratified by age category.Analyses were completed with SAS for Windows Version 9.4.

RESULTS

Baseline Characteristics

At baseline, the mean (standard deviation [SD]) participant age was 57.9 (4.1) years (Table 1).

Table 1 Selected Baseline Characteristics of the 64,739 Study Participants

Compared with the participants included in this study, participants excluded due to death or termination from the study after less than 10 years of follow-up time were older and less likely to be white, and had higher body mass index (Appendix Table 1).

Fracture Incidence

The observed 10-year hip fracture probability was 0.3% for women aged 50–54 years (n = 14,768), 0.6% for women aged 55–59 years (n = 22,442), and 1.1% for women aged 60–64 years (n = 25,513). The observed 10-year MOF probability was 6.3% for women aged 50–54 years (n = 14,947), 8.0% for women aged 55–59 years (n = 22,721), and 9.9% for women aged 60–64 years (n = 25,953). The observed 10-year probability of any clinical fracture was 15.8% for women aged 50–54 years (n = 15,221), 17.2% for women aged 55–59 years (n = 23,086), and 19.0% for women aged 60–64 years (n = 26,432).

Discrimination and Calibration of Hip Fracture Probability by the FRAX and Garvan Models (Without BMD Information)

The overall AUC value for prediction of hip fracture was 0.68 (0.65–0.70) AUC for FRAX and 0.62 (95% CI 0.59–0.65) for Garvan.

At risk thresholds corresponding to 80% sensitivity, AUC(c) values for discrimination of hip fracture were low (0.64 [95% confidence interval (CI) 0.61–0.66] for FRAX; 0.57 [95% CI 0.55–0.60] for Garvan), and specificity was low (30.6% [95% CI 30.3–31.0%] for Garvan, 43.1% [95% CI 42.7–43.5%] for FRAX) (Table 2; Fig. 1a, b). In calibration assessment, observed fracture probabilities were similar to FRAX-predicted probabilities but greater than Garvan-predicted probabilities in each age stratum (Fig. 2, Appendix Table 2).

Table 2 Sensitivity, Specificity, Positive Predictive Value (PPV), and Area Under the Receiver Operating Characteristic Curve (AUC) of the Garvan Tool and FRAX for Identifying Incident Hip Fracture Over 10 Years of Follow-up Under Various Construct Cut-points (Sensitivity Range of 80–99%)
Figure 1
figure 1

Area under the receiver operating characteristic (ROC) curves for the prediction of incident 10-year hip fracture by the Garvan tool and Fracture Risk Assessment Tool (FRAX). a ROC curve for identifying hip fracture by the Garvan tool. b ROC curve for identifying hip fracture by the Fracture Risk Assessment Tool (FRAX).

Figure 2
figure 2

Observed versus predicted 10-year probability of hip fracture overall and by age group according to the Garvan tool and Fracture Risk Assessment Tool (FRAX).

In analyses excluding data from participants who used osteoporosis medication or estrogen therapy during the follow-up period, results were similar to those of the primary analysis (Appendix Table 3).

In stratified analyses, at AUC(c) thresholds, the sensitivities of the FRAX and Garvan risk tools for identifying incident hip fracture were much lower for women aged 50–54 (Garvan 0.0% [95% CI 0.0–0.0%], FRAX 10.4% [95% CI 1.5–19.4%]) than those for women aged 60–64 (Garvan 24.2% [95% CI 19.1–29.3%], FRAX 80.5% [95% CI 75.8–85.2%]) (Table 3). Results were very similar among women who were current users of hormone therapy at baseline and those who were not. However, in race/ethnicity-stratified analyses, the sensitivities of FRAX and Garvan models for identifying incident hip fractures were markedly lower in African American and Hispanic women than those among white women. At AUC(c) values, the sensitivity in African American women was 0.0% (95% CI 0.0–0.0%) for FRAX and 7.7% (95% CI 0.0–24.5%) for Garvan. The sensitivity in Hispanic women was 12.5% (95% CI 0.0–42.1%) for FRAX and 0.0% (95% CI 0.0–0.0%) for Garvan. The sensitivity in white women was 62.2% (95% CI 57.7–66.8%) for FRAX and 16.2% (95% CI 12.8–19.7%) for Garvan.

Table 3 Sensitivity, Specificity, and Area Under the Receiver Operating Characteristic Curve (AUC) of the Garvan Tool and Fracture Risk Assessment Tool (FRAX) for Identifying Incident Hip Fracture After 10 Years, Stratified by Age Group, at Thresholds Maximizing the Area Under the Receiver Operating Characteristic Curve (AUC)

For both the FRAX and Garvan tool, AUC values for prediction of hip fracture were low across racial/ethnic groups, albeit slightly higher among white women (Appendix Table 4). AUC values ranged from 0.53 to 0.58 in African American and Hispanic participants, indicating that the tools performed no better than chance in discriminating between African American and Hispanic participants who did and did not experience hip fractures during the follow-up period.

At AUC(c) values, the sensitivity of the 5-year Garvan hip fracture prediction model was lower than that of the FRAX (10-year) hip prediction model, and lower among 50- to 54-year-old women (0% [95% CI 0.0–100%] for Garvan 5 years, 10.4% [95% CI 1.5–19.4%] for FRAX) than among 60- to 64-year-old women (12.6% [95% CI 6.1–19.1%] for Garvan 5 years, 80.5% [95% CI 75.8–85.2%] for FRAX) (Appendix Table 5).

Discrimination of 10-Year Probability of MOF and Any Clinical Fracture for the FRAX and Garvan Models (Without BMD Information)

The overall AUC values for MOF were 0.58 (95% CI 0.57–0.59) for FRAX and 0.57 (95% CI 0.57–0.58) for Garvan; overall AUC values for clinical fracture were 0.55 (95% CI 0.54–0.56) for FRAX and 0.55 (95% CI 0.54–0.55) for Garvan.

Neither FRAX nor Garvan performed well in discriminating between women who did and did not experience MOF (Appendix Fig. 2a, b) or clinical fractures (Appendix Fig. 3a, b). AUC(c) values were identical, 0.56 in each age stratum, and corresponding sensitivity values were low for the Garvan tool and FRAX in all strata (range 26.7–46.8%) (Table 4). Findings were similar regarding prediction of any clinical fracture (Appendix Table 6).

Table 4 Sensitivity, Specificity, and Area Under the Receiver Operating Characteristic Curve (AUC) of the Garvan Tool and Fracture Risk Assessment Tool (FRAX) for Identifying Incident Major Osteoporotic Fracture After 10 Years, Stratified by Baseline Age, at Thresholds Maximizing the Area Under the Receiver Operating Characteristic Curve (AUC)

For both the FRAX and Garvan tool, AUC values for prediction of MOF were low and similar across racial/ethnic groups (range 0.53–0.61) (Appendix Table 7).

DISCUSSION

In this cohort of US postmenopausal women aged 50–64 years, we compared the Garvan and FRAX fracture risk assessment tools. The FRAX risk tool was only modestly better than chance, and the Garvan risk tool was no better than chance, in discriminating women with and without incident hip fractures during 10 years of prospective follow-up. For prediction of MOF and any clinical fracture, performance of the two tools was similar and poor. At AUC(c) values, sensitivity of both tools was extremely low (ranging 0–13%) for African American and Hispanic women and women aged 50–54 years. At thresholds yielding sensitivity 80% for identification of observed hip fractures, PPV was very low (< 2%) for both tools. Observed fracture probabilities were similar to FRAX-predicted probabilities but greater than Garvan-predicted probabilities, indicating that the FRAX is well calibrated, but the Garvan tool markedly underestimates actual 10-year fracture probabilities.

To our knowledge, this is the first study to compare actual vs. predicted 10-year hip fracture risk for the FRAX and Garvan risk tools in postmenopausal women aged 50–64 years in the USA. A few studies were performed outside the USA and/or in older women. First, the GLOW study of women from 10 countries including the USA had a 2-year duration of follow-up; two thirds of the participants were aged 65 and older, i.e., older than the present study of women aged 50–64 years.10 In the GLOW study, AUC values for hip fracture were 0.65 for FRAX and 0.61 for the Garvan risk calculator.10 Second, the study by Bolland and colleagues in New Zealand, which was limited to women with normal BMD for their age (mean 74 years), found similar AUC values for the FRAX and Garvan risk tools, ranging between 0.60 and 0.70 for hip fractures during a mean 8.8-year follow-up.11 The previous studies did not report AUC values separately in women aged 50–64, precluding a direct comparison with our current study. Finally, a recent Israeli study with an average follow-up of 4.7 years found underestimation of hip fracture risk by FRAX and Garvan and superior calibration of FRAX compared with Garvan, as in our study, but higher AUC values (0.82 for FRAX, 0.78 for Garvan) compared with our study.12 The differences in AUC values between the studies may be due to inclusion of both men and women, a wider age range, and/or shorter follow-up duration in the Israeli study compared with our study.12

Our results are clinically relevant to general internists because of the several different existing guidelines regarding screening in postmenopausal women aged 50–64. Although the FRAX is recommended by several guidelines for screening decisions in postmenopausal women in this age group,1, 9 general internists should be aware that neither the FRAX nor Garvan risk calculator performs well in identifying postmenopausal women aged 50–64 years who will experience hip fractures, MOF, or any clinical fracture during 10 years of follow-up. Thus, our results do not support the use of either FRAX or the Garvan hip fracture risk prediction tool in postmenopausal women aged 50–64 years. Moreover, the ability of the tools to distinguish between women who do and do not experience fracture was particularly low among African American and Hispanic women, and no better than chance. The role of the clinical use of these tools among minority women should be examined in future studies. Given the unsatisfactory performance of the tools for fracture risk prediction and low 10-year observed probability of disabling fracture (i.e., hip fracture), our results suggest that emphasis on lifestyle strategies rather than drug treatment for fracture risk reduction is appropriate for most women in this age group.

Limitations of this study include that participants who were excluded (due to providing less than 10 years of study follow-up) were older. However, there is no indication that this would lead to bias favoring one fracture prediction method over another. Our analyses evaluating tool performance within racial/ethnic groups require confirmation in other studies because of the relatively low incidence of fractures in African American and Hispanic participants. Strengths of our study include the large sample size, the 10-year duration of follow-up, medical record-based confirmation of hip fractures, and detailed information regarding fracture risk factors.

In conclusion, the FRAX and Garvan fracture risk calculator discriminated poorly between postmenopausal women aged 50–64 years who do and do not experience hip fracture during 10-year follow-up; discrimination was similarly poor for MOF and clinical fractures. For women of color, the discriminative ability of both tools was very poor, and no greater than chance. There is not a useful threshold for either tool for prediction of these fracture outcomes. Calibration was acceptable for the FRAX, but the Garvan tool underestimated observed fracture probabilities. Fracture prediction in this age group requires assessment of risk factors not included in currently available strategies. These results will help to inform future osteoporosis screening guidelines.