A systematic review on the performance of fracture risk assessment tools: FRAX, DeFRA, FRA-HS

Purpose Preventing fragility fractures by treating osteoporosis may reduce disability and mortality worldwide. Algorithms combining clinical risk factors with bone mineral density have been developed to better estimate fracture risk and possible treatment thresholds. This systematic review supported panel members of the Italian Fragility Fracture Guidelines in recommending the use of best-performant tool. The clinical performance of the three most used fracture risk assessment tools (DeFRA, FRAX, and FRA-HS) was assessed in at-risk patients. Methods PubMed, Embase, and Cochrane Library were searched till December 2020 for studies investigating risk assessment tools for predicting major osteoporotic or hip fractures in patients with osteoporosis or fragility fractures. Sensitivity (Sn), specificity (Sp), and areas under the curve (AUCs) were evaluated for all tools at different thresholds. Quality assessment was performed using the Quality Assessment of Diagnostic Accuracy Studies-2; certainty of evidence (CoE) was evaluated using the Grading of Recommendations Assessment, Development and Evaluation approach. Results Forty-three articles were considered (40, 1, and 2 for FRAX, FRA-HS, and DeFRA, respectively), with the CoE ranging from very low to high quality. A reduction of Sn and increase of Sp for major osteoporotic fractures were observed among women and the entire population with cut-off augmentation. No significant differences were found on comparing FRAX to DeFRA in women (AUC 59–88% vs. 74%) and diabetics (AUC 73% vs. 89%). FRAX demonstrated non-significantly better discriminatory power than FRA-HS among men. Conclusion The task force formulated appropriate recommendations on the use of any fracture risk assessment tools in patients with or at risk of fragility fractures, since no statistically significant differences emerged across different prediction tools. Supplementary Information The online version contains supplementary material available at 10.1007/s40618-023-02082-8.


Introduction
Osteoporosis is a chronic disease characterized by bone fragility, which leads to an increased risk of fractures [2].As fragility fractures are a leading cause of disability and mortality worldwide, osteoporosis treatment should primarily aim at preventing fractures [1].
Low bone mineral density (BMD) is a major determinant of risk; it has been demonstrated that an increase in BMD is associated with fracture risk reduction in a quasi-linear manner [3].However, BMD combined with clinical risk factors predicts fracture risk better than BMD alone [4]; these include: comorbidities, treatment with glucocorticoids, or a history of previous fractures.These factors are independent predictors of fracture and are associated with deterioration of bone quality [2].Algorithms that combine clinical risk factors with BMD have been developed to better estimate fracture risk and determine possible thresholds for treatment [5][6][7].
The most widely used algorithm is the Fracture Risk Assessment Tool (FRAX), which was originally developed in 2008 by the World Health Organization collaborating center of the University of Sheffield, UK [6].In Italy, other FRAX-derived tools (DeFRA and FRA-HS) are widely used for calculating fracture risk.The DeFRA was developed in 2010 by the Italian Society for Osteoporosis, Mineral Metabolism, and Bone Diseases (SIOMMMS) and the Italian Society of Rheumatology (SIR) [5].The FRA-HS was developed and published by the Italian Society of General Practitioners (SIMG) [8].Both algorithms have been validated against FRAX in post-menopausal women with osteoporosis [8,9].DeFRA considers the following patients' clinical and densitometric characteristics for fracture risk calculation: age, weight, height, number and site of prior fragility fracture, parental history of hip and clinical vertebral fractures, glucocorticoid intake (semi-quantitative variable), treatment with adjuvant hormone therapy for breast cancer, the presence of various comorbidities (including rheumatoid arthritis, multiple sclerosis, psoriatic arthritis, systemic lupus erythematosus, other connective tissue disease), calcium intake from diet and supplements, vitamin D intake, falls, exposure to sunlight and both lumbar spine and femoral neck BMD [5].
FRA-HS estimate the fracture risk upon these characteristics: age, sex, history of osteoporotic fractures (dichotomic variable), secondary osteoporosis (dichotomic variable), long-term use of corticosteroids (dichotomic variable, at least 180 defined daily dose within the year prior to assessment), rheumatoid arthritis diagnosis, body mass index, smoking (dichotomic variable), and alcohol abuse/alcoholrelated diseases (dichotomic variable) [8].
The Italian National Institute of Health (Istituto Superiore di Sanità) recently published the Italian guidelines "Diagnosis, risk stratification and continuity of care of Fragility Fractures" [10].In regard to risk stratification, the task force focused on the three most commonly used fracture risk assessment tools in Italy (DeFRA, FRAX, and FRA-HS).A systematic review was conducted for each of these tools with the aim of assessing their clinical performance in patients at risk of fractures; the review also aimed to accumulate all relevant literature for formulating evidence-based recommendations.Herein, we present the results of the systematic review and meta-analysis on the performance of fracture risk assessment tools in patients at risk of fracture.The present meta-analysis informed the guidelines of the Italian National Institute of Health on fragility fractures.

Materials and methods
A systematic review was performed to support the panel members of the Italian Fragility Fracture Guidelines (published on the platform of the Italian National Institute of Health [11]) in formulating recommendations.In accordance with the GRADE-ADOLOPMENT methodology [12] and the standards elaborated by the Sistema Nazionale Linee Guida (SNLG) [13,14], the multidisciplinary panel aimed to answer the following clinical question: "Which risk assessment tools are the most accurate in predicting the risk of fragility fractures in adults, including those without known osteoporosis or previous fragility fractures?".The recommendations from the CG146 guideline of the National Institute for Clinical Excellence (NICE) (which assessed fragility fracture risk in patients with osteoporosis) were updated and adapted for this review.

Inclusion and exclusion criteria
Observational studies were selected if they met the following criteria: (1) population: patients with osteoporosis or those who had experienced a fragility fracture, according to the diagnostic criteria for osteoporosis and the definition of fragility given by different studies' authors.In the vast majority of studies osteoporosis was defined based on T-score levels, fragility fracture was defined as: any asymptomatic morphometric vertebral fractures and/or any clinical bone fracture resulting from a fall from standing height or less or for a low-energy trauma; (2) risk assessment tools: FRAX [15], DeFRA [16], and FRA-HS [17]; reference standard: risk threshold for major osteoporotic fractures (MOF) (3%, 5%, 10%, 20%, and 30%) and hip fractures (3% and 5%), either with or without the BMD criterion; (3) outcome: (i) primary outcome measures of sensitivity (Sn) (capacity to correctly detect the fracture risk) and specificity (Sp) (exclusively identified fracture-free patients) for the risk assessment tools (studies were required to have Sn and Sp values, an adequate 2 × 2 table, or adequate data for creating the 2 × 2 table).Moreover, (ii) secondary outcomes were the receiver operating characteristic curve and the area under the curve (AUC) for Sn and Sp and, to easier interpret their goodness of fit, values were expressed in percentages by multiplying per 100.
Studies were excluded if they: (i) were not published in the English language, (ii) did not report original findings (i.e., letters and case reports), (iii) did not identify patients affected by fragility fractures or osteoporosis, or (iv) did not consider the risk assessment tools of interest (FRAX, DeFRA, or FRA-HS).

Data source and search strategy
PubMed, Embase, and the Cochrane Library were searched (between September 2011 and December 2020) by updating the search strategy of the NICE guidelines for the FRAX tool; a new search was conducted for the DeFRA and FRA-HS tools.Publications on the risk assessment tools were identified in patients with fragility fractures or osteoporosis.The systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Metaanalyses (PRISMA) [18]; the statement has been provided in Supplemental Table S1.The search strategy (Supplemental Material, A) included specific keywords and/or corresponding Medical Subject Headings terms related to fragility fracture/osteoporosis AND risk assessment tools.The reference lists of the studies were checked and systematic reviews were identified during the search process.

Study selection and data extraction
Three independent authors (AB, GP, and RR) screened the titles and abstracts based on the search strategy and then assessed the full text of potentially relevant studies.Discrepancies between readers were resolved in conference.
The following data were extracted for each included observational study: (i) first author, year, and country of publication; (ii) study setting; (iii) duration of study; (iv) type of population; (v) intervention; and (vi) outcome (Supplemental Material, B).

Study quality
The methodological quality of the included studies was evaluated using the Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) checklist [19].The QUA-DAS-2 assessment was structured in four key domains: patient selection, index test, reference standard, flow and timing (Supplemental Table S2).

Quality of evidence
The quality of evidence for each primary outcome was assessed based on five dimensions (risk of bias, consistency of effect, imprecision, indirectness, and publication bias) using the GRADE approach [20].If serious or very serious limitations were found for each of the 5 dimensions, the evidence was downgraded from "high quality" by 1 and 2 levels, respectively.

Statistical analysis
The following operating characteristics were evaluated for analysis of the risk assessment tool: the Sn and Sp (at different thresholds) and the AUC.Specific thresholds were used to differentiate between individuals with or without the target condition.In this context, the development group of the NICE guidelines established risk thresholds for MOF (3%, 5%, 10%, 20%, and 30%) and hip fractures (3%, 5%, and 10%).A low Sn implied that the tool did not recognize a proportion of MOFs or hip fractures; conversely, a low Sp indicated that the tool could lead to false positive cases and overestimate the incidence of these fractures.Analyses were therefore performed when studies reported different cut-off values for the same risk assessment tool.
The Sn and Sp estimates were used to realize coupled forest plots with 95% confidence intervals (CIs) across studies (at various thresholds); RevMan V.5.4 (Nordic Cochrane Center) software was used for evaluation.The AUC was used to evaluate the overall diagnostic accuracy of each risk assessment tool.Diagnostic meta-analysis was conducted when 3 or more studies were available per threshold.This measure was also plotted on a graph using RStudio software version 1.4.1717.Heterogeneity or inconsistency among studies was visually inspected using the forest plots for MOF or hip fractures, both with and without BMD.

Study selection
As shown in Fig. 1, a total of 2702 publications were identified; 2565 studies were excluded after title and abstract screening.Among the remaining 137 articles which were assessed for full-text review, 98 were excluded owing to the following reasons: (i) the intervention (n = 5) or outcome (n = 18) was considered to be incorrect, (ii) they were out of scope (n = 5) or only abstract (n = 68), (iii) the study design was not eligible for inclusion (n = 1), and (iv) the studies were not published in the English language (n = 1).Finally, 43 articles were considered for the present analysis; these included 40, 1, and 2 studies pertaining to the FRAX , FRA-HS [8], and DeFRA [9,61] tools, respectively.

Risk of bias assessment and certainty of the evidence
Unclear risk of bias was generally present across the studies (Supplemental Table S2).In the entire population, the FRAX tool demonstrated high certainty of evidence: (i) with or without BMD for MOF (at 30% threshold), (ii) MOF (at 20% or 30% cut-off), and (iii) hip fractures (at 3% cut-off, only for Sp) (Supplemental Table S3).A moderate certainty of evidence (Supplemental Table S3) was detected for MOF: (i) without BMD (cut-off at 5% or 20%), (ii) with BMD (at 3% threshold, only for Sn), and (iii) hip fracture with BMD (cut-off at 5%).The remaining Sn and Sp values had low or very low certainty of evidence.

Sensitivity (Sn) and specificity (Sp)
Sn and Sp evaluation was only performed for the FRAX tool.The results showed a reduction of Sn and an increase of Sp

Area under the curve
The meta-analytic summary of the AUCs for the risk assessment tools is shown in Supplemental Material C and Table 2.The diagnostic accuracy of the FRAX (with and without BMD) and FRA-HS tools (without BMD) was evaluated in women, men, and the entire population.The AUC for DeFRA (with BMD) in cases of MOF was evaluated and compared to that of the FRAX instrument in women as well as in diabetic patients.
Inconsistencies, classified as not serious, serious, and very serious, have been presented in Supplementary Table S3.

Discussion
This systematic review evaluated one clinical question of the Italian Guidelines [11], and a multidisciplinary panel of experts formulated recommendations through a structured, transparent, and evidence-based process.This systematic review and meta-analysis was particularly conducted to evaluate the accuracy of three fracture risk assessment tools (DeFRA, FRAX, and FRA-HS).A total of 43 studies that assessed the performance of tools in identifying at-risk patients were included.Overall, FRAX and DeFRA appeared to perform better than FRA-HS in terms of discriminatory power.All three tools generally performed better for hip fractures than for MOF.As expected, the AUC was higher in women compared to men, mostly with the addition of BMD in the algorithm.
The results of this meta-analysis allowed determination of a recommendation, which suggests the use of risk assessment tools for predicting fractures in patients with or at risk of fragility fractures (moderate quality of evidence).
Other meta-analyses have been conducted on this topic.In 2019, Beaudoin and colleagues published a systematic review and meta-analysis that assessed 14 tools including the FRAX and FRA-HS.The authors analyzed 53 validation studies and found results similar to those of the present meta-analysis.For instance, Beaudoin et al. showed that the tools performed better in predicting hip fractures than fractures at other sites.They also found that the Q-Fracture and Garvan risk tools slightly outperformed the FRAX in predicting hip fractures; this concurs with the findings of an older meta-analysis by Marques and colleagues [62].In the present meta-analysis, we also found that the DeFRA had slightly higher discriminatory power compared to the FRAX.Indeed, the Garvan, Q-Fracture, and DeFRA tools resolve certain critical issues of the FRAX.Although the FRAX tool represents a crucial milestone in the management of osteoporosis, the algorithm has significant limitations; this may undermine its predictive value.For example, the FRAX does not consider lumbar spine BMD data, which are considered by the DeFRA and Garvan tools.In addition, clinical risk factors (e.g., prior fractures, glucocorticoids, and smoking habits, among others) are scaled down to dichotomous variables in FRAX.However, small differences in prediction ability between FRAX and other more complex algorithms may only have minimal relevance.

Limitations and strengths
The findings of this study should be interpreted considering its strength and limitations.First, the task force decided to include only three fracture risk assessment tools in the Italian Guideline on the management of Fragility Fracture, because these instruments have been translated into the Italian language.Second, there are certain concerns as to whether findings from selected studies can be combined to draw one conclusion; this is because all the aforementioned results had high levels of heterogeneity depending on the baseline characteristics of the validation cohorts and the quality of the included studies (fracture diagnosis, and length of follow-up, among others).Third, an unclear risk of bias was detected across the included studies.Thus, the certainty of evidence for the assessed outcomes was judged to be "very low" or "moderate" owing to very serious inconsistencies and serious imprecision of the estimates.Fourth, most of the studies included in the meta-analysis were conducted outside Italy and the results might not be directly applicable to the Italian population.However, the vast majority of the population of the meta-analysis was of European ancestry possibly reducing such bias.Despite these limitations, this study had certain strengths.In view of the discriminatory power of the risk assessment tools, the exhaustive search strategy provided a reliable overview of the studies.In addition, the internal validity of the included studies was assessed using the QUADAS-2 checklist for diagnostic accuracy studies.

Conclusion
The present meta-analysis evaluated the diagnostic accuracy of three (FRAX, FRA-HS, and DeFRA) fracture risk prediction tools.The task force formulated recommendations on the use of any of these algorithms but did not identify a better performing tool.Although, our systematic review identified some outcomes (Sn and Sp) that were affected by "very low" to "moderate" quality evidence.

Table 2
Area under the curve (AUC) for major osteoporotic (a) and hip (b) fractures by considering the FRAX, FRA-HS, DeFRA tools (with or without BMD) We reported the minimum and the maximum AUC value, the lower and the upper limit of the 95% confidence interval (