Introduction

Although reduced bone mass is an important and easily quantifiable measurement, studies have shown that most fractures occur in individuals with bone mineral density (BMD) above a T-score of −2.5 [15]. As a result, the emphasis of recent clinical practice guidelines for osteoporosis has shifted from BMD to fracture risk [6, 7]. In fact, new reporting guidelines base treatment recommendations on assessments of fracture risk, as opposed to diagnosis of osteoporosis based on BMD T-scores alone [8].

Measures of fracture risk, such as the Fracture Risk Assessment tool from the World Health Organization (WHO) [9] and the Canadian Association of Radiologists and Osteoporosis Canada (CAROC) tool [10], have been designed to predict an individual’s 10-year fracture risk. In 2005, the Canadian Association of Radiologists (CAR) recommended fracture risk assessments to be included on all reading specialists’ (typically radiologists’) BMD reports [11]. Because clinical practice guidelines recommend that these assessments be used to guide treatment decisions, it is important to ensure not only that reports contain the assessments, but also that these assessments are accurate. However, an accurate fracture risk assessment may be difficult for a reading specialist to produce as it depends on information beyond BMD T-score, such as fracture history. Such clinical information may be difficult for a specialist to access and is therefore subject to omission on reports [9, 10].

The primary objective of this study is to examine the accuracy of fracture risk assessments on BMD reports from a wide range of imaging laboratories for individuals with a history of fragility fracture in non-urban areas in the province of Ontario, Canada. The BMD reports studied were gathered as part of a cluster randomized trial in 2008. As a result, assessment accuracy is defined as concordance between the fracture risk stated on the BMD report and assessments produced by our research team using (1) knowledge of fracture history and (2) the assessment methodology sanctioned by CAR in 2005 [11] and current as of 2008. It should be noted, however, that Osteoporosis Canada has since recommended significant methodological changes for fracture risk assessment in their 2011 Guidelines [8]. Secondary objectives were to determine if the reports followed the 2005 CAR standard for diagnostic categorization and were in the recommended report format.

Methods

Study design

The BMD reports examined in this study were collected as part of a cluster randomized trial evaluating the effect of a centralized coordinator who identifies and follows up with fracture patients treated in small non-urban community hospitals and their primary care physicians about osteoporosis care, including referral for BMD testing and pharmacologic treatment [12].

Setting and participants

Hospitals without a dedicated fracture clinic and that treated more than 60 fracture patients per year in their emergency department (ED) were eligible (n = 54) for the trial. Ethical approval was obtained from the Research Ethics Board of the Toronto Rehabilitation Institute and each of the participating sites.

Emergency department records provided through the National Ambulatory Care Reporting System database at each hospital were used to identify all new cases of fracture. Records were selected for individuals over 40 years of age who sustained fractures at the hip, forearm, wrist, rib(s), sternum, thoracic and lumbar spine, shoulder and upper arm, pelvis, lower leg, and ankle. Patients with “cause of injury” codes indicating that the fracture was not due to major trauma (e.g., traffic accidents), who were residing in a nursing home, or with fractures that occurred more than 3 months between the time of their initial ED visit and preparation of the list for the centralized coordinator were excluded.

Patients were recruited by telephone between January and July 2008 and further screened with the following exclusion criteria: unable to contact, died, in long-term care, cognitive or hearing impairment, lived outside the region, and previously screened by an Osteoporosis Strategy coordinator at another hospital [13]. During screening, all reports of fragility fracture were verified by a physical therapist who confirmed that the patient had had a low-trauma fracture. Data were collected at baseline and follow-up at 6 months. All patients who had a BMD test scheduled or performed by the 6-month follow-up call were asked permission to allow the researchers to contact their family physician to obtain a copy of the report. Bone mineral density test reports were gathered by fax from consenting patients’ family physicians.

Data abstraction

Each BMD report was reviewed by two members of the research team, and data were abstracted using a standardized template that included risk factors used by the CAROC fracture risk assessment tool.

Fracture risk assessment review

The CAROC 10-year fracture risk assessment tool incorporates BMD information (lowest T-score from the lumbar spine (L2–L4), femoral neck, and total hip), age, sex, fracture history, and glucocorticoid use [11]. Calculation of fracture risk is not recommended for individuals under age 50 and for individuals age 50 and older; risk reporting is recommended regardless of osteoporosis treatment status [8]. It should be noted, however, that in 2005, some ambiguity existed as to whether risk should be reported for patients on treatment; risk reporting for treated patients is not explicitly outlined by Siminoski and colleagues [11].

The lowest T-score on reports from the spine, total hip, or femoral neck, in combination with each patient’s age and sex, was used to calculate baseline 10-year absolute fracture risk. This is in accordance with CAR’s 2005 recommendations, which state: “the lowest T-score from the spine, the total hip, the trochanter and the femoral neck” is to be used to calculate baseline risk, but add that assessments are “based on published data for only the femoral neck” [11]. Osteoporosis Canada’s 2011 Guidelines have since recommended only femoral neck T-scores be used as the basis for fracture risk assessment [8]. As all patients in this study sustained a recent fracture, all calculated baseline fracture risk assessments were then elevated one category of risk, as per instructions outlined by Siminoski and colleagues [11]. For example, those with “low” fracture risk based on BMD T-score, age, and sex were assigned to the “moderate” risk category, and those with “moderate” fracture risk were assigned to the “high” risk category. Patients with recent prolonged systemic glucocorticoid use, as evidenced by information on reports, were placed in the “high” fracture risk category regardless of BMD T-score because they also had fragility fracture.

Assessments made by the research team and using the CAROC heuristics were then compared to the fracture risk assessments presented in the reading specialists’ reports. In the instances where several competing fracture risks were assigned to different imaged regions (e.g., the lumbar spine versus total hip) and the specialist additionally indicated an overall fracture risk, the overall risk assessment only was compared to the assessment made by the research team.

Concordance between assessments made by reading specialists and the research team was measured using Cohen’s kappa [14, 15]. Raw kappa statistics were calculated as well as linearly weighted kappas, with weights structured to penalize disagreements separated by two categories of risk more than those separated by one category.

Diagnostic categorization review

Collected reports were also reviewed to determine if CAR’s standards of diagnostic categorization, published in 2005 [11], were used on the BMD reports. The CAR’s categorizations differ from the WHO’s in that they distinguish post-menopausal women (“normal”, “osteopenia,” and “osteoporosis”) from pre-menopausal women and men (“normal” or “reduced bone density”). To assign CAR diagnostic categorizations, the research team abstracted the gender, age, and lowest T-score results from the following sites: lumbar spine, total hip, trochanter, and femoral neck. These data as well as menopausal status were then used to categorize participants according to CAR criteria. Diagnostic categories assigned by the research team were then compared to categories presented by reading specialists. Where the reading specialists assigned several competing diagnoses to different imaged regions (e.g., the lumbar spine versus total hip), it was assumed that the specialist’s overall diagnosis for the patient was the one based on the lowest T-score present. This diagnosis was then compared to the assessment made by the research team. To assess prevalence of standards, we report the percentage of reports that agree with CAR diagnostic criteria.

Conformation to CAR’s 2005 reporting recommendations

Finally, collected reports were reviewed to determine their overall conformation to CAR’s 2005 report format recommendations. Specifically, the 2005 recommendations suggest that all baseline reports include patient identifiers, a DXA scanner identifier, BMD raw results (in g/cm2), T-scores, a diagnostic category, and, for patients over age 50, a fracture risk category. For serial scans, additional information is suggested for inclusion: a statement as to whether BMD change was statistically significant and the BMD test center’s least significant change (LSC) for each skeletal site (in g/cm2) [11].

To determine the degree to which 2008 reports conformed to 2005 format recommendations, the presence of the informational elements listed above was counted in the collected reports. Information could appear anywhere in the reports to be counted, including in attachments from DXA machines. A report including the brand of the DXA scanner used met the criteria for DXA scanner identifier. It should be noted that software programs used by scanners often provide information, such as significance of change in BMD or diagnostic categories, by default. “Patient identifiers” were defined as the patient’s name, date of birth, and sex.

Results

Descriptive information

Of the 267 fracture patients, a total of 103 had a BMD scheduled or performed at the 6-month follow-up data collection time point. Of these, 53 BMD reports (51 %) were received from the referring physician. Five reports were excluded from the present analysis because they pre-dated the participants’ fracture (n = 2), were produced by a clinic outside of Ontario (n = 1), or were incomplete with only one of two pages received (n = 2). This resulted in 48 BMD reports eligible for analysis representing 27 baseline and 21 repeat scans. The 48 BMD reports were produced by a total of 27 independent BMD scanning facilities, including 19 hospitals, between May of 2007 and October of 2008. About one half of the scans were produced by BMD facilities in small towns (<30,000 population).

The demographic characteristics of the patients represented in this sample of BMD reports are provided in Table 1. The mean age was 67.2 years (SD ± 10.9 years). Approximately three-quarters were women, and 43.8 % had received a prior BMD test.

Table 1 Demographic characteristics: patients (n = 48)

Fracture risk assessment review

Tables 2 and 3 summarize the results of the fracture risk assessment review. Of the 48 reports, 42 (87.5 %) contained a fracture risk assessment. Of note, on two reports that did not report fracture risk, a statement was made that fracture risk assessments were not valid for individuals receiving treatment for osteoporosis. Moreover, of those reports that contained a fracture risk assessment, ten (20.8 %) reported multiple fracture risks (i.e., one for every imaged site).

Table 2 Fracture risk assessment review
Table 3 Fracture risk assessment matrix

On 27 of the 42 reports with a fracture risk assessment (64.3 %), this fracture risk reflected BMD T-scores, age, and gender, but not fracture history or other modifying factors. These 27 reports represented 57.1 % of the repeat tests and 55.6 % of the baseline tests. Thirty-seven percent of the baseline tests and 28.6 % of repeat tests reported a “low” fracture risk where, given the recent fracture, “moderate” risk was assigned by the research team. In 18.5 % of baseline tests and 28.6 % of repeat tests, “moderate” fracture risk was reported where “high” risk was assigned by the research team, given the recent fracture. Fracture risk was therefore underestimated in more than 50 % of the reports overall.

Table 3 presents a matrix relating risk assessments produced by the research team to those produced by reading specialists. Based on this matrix, a Cohen’s kappa of 0.036 was computed, indicating the agreement between the research team and the reading specialists to be poor [14]. A linearly weighted kappa was also computed so as to penalize disagreements spanning more than one category of risk more than disagreements spanning only one category. In order to compute this kappa, rows and columns corresponding to reports with “no assessments” were excluded from Table 3. The weighted kappa was 0.21, which lies at the margin of poor to fair agreement [15].

Diagnostic categorization review

Results from the review of diagnostic categorizations are reported in Table 4. The majority of reports (95.8 %) included a diagnosis. Sixteen of the 48 reports (33.3 %), however, included a distinct diagnosis for each region scanned.

Table 4 Diagnostic categorization review

Of the 26 baseline reports with a diagnosis, 18 (66.7 %) made use of the CAR criteria. Inconsistencies with CAR categorizations were restricted to men in the sample. Three men (represented in two baseline and one repeat scans) were diagnosed with osteoporosis where “reduced bone density” was recommended; an additional six were diagnosed with osteopenia where the same “reduced bone density” category was advised. Two reports (one repeat and one baseline) did not include a diagnostic category. Of note, one repeat test mentioning menopausal status was for a man.

Conformation to CAR’s 2005 reporting recommendations

All reports included patient identifiers as well as T-scores for imaged sites (see Table 5). Bone mineral density was additionally reported (in raw g/cm2 units) in 85 % of baseline and 95 % of repeat tests. Only two of the 48 reports (one baseline and one repeat) did not include a diagnostic categorization and the majority contained a fracture risk assessment, although many were inconsistent with assessments produced by the research team as reported above. All of the follow-up tests included a statement of BMD change (where this change could be calculated).

Table 5 Elements from CAR 2005 recommendations

Elements of reports that were less likely to be included were scanner identifiers and LSCs detectable by scanners. Approximately 48 % of baseline reports and 85.7 % of repeat reports included some information on the brand of scanners used. Approximately 44 % of baseline and 71.4 % of repeat tests relied on attachments produced by scanning machines to provide this information. Least significant changes for each skeletal site were reported in only one, or 3.7 %, of the 21 repeat exams.

Discussion

The current study of 48 BMD reports from 27 independent BMD scanning facilities in the province of Ontario aimed to determine accuracy of 10-year fracture risk assessments present on BMD reports in Ontario as of 2008, as well as overall conformation to CAR’s 2005 published reporting standards. In 2008, there were approximately 150 hospitals in the province that were performing BMD scans (Ontario Ministry of Health and Long-Term Care, 2011, personal communication); our study captures data from reports produced by 19 of these, which is more than 10 % of the total.

The main finding of this study was that a minority of both baseline and repeat reports included risk factors, namely previous fracture, in the overall assessment of fracture risk even though all of the patients had had a recent fracture. This led to subsequent inaccuracies in terms of fracture risk assessment with fracture risk being underestimated in more than 50 % of the BMD reports. A strength of this study is that the patients’ history of fragility fracture is based both on records of visits to EDs as well as on interviews with an osteoporosis coordinator. In addition, the study demonstrates that standards for diagnosis published by CAR in 2005 were not regularly employed nor were recommendations for formatting particularly as they related to least significant detectable changes or scanner identification.

The fact that modifying risk factors are missed on reports has implications not only for risk assessment, but also for corresponding treatment recommendations. Indeed, the most recent guidelines from Osteoporosis Canada on the assessment of fracture risk link each of the high-, moderate-, and low-risk assessment groups with specific treatment recommendations/considerations [8]. Moreover, previous research has indicated that referring physicians actively look to BMD reports to provide these treatment recommendations [11, 1619]. A 1998 survey of Ontario physicians found that suggestions for investigation and management are among the most helpful features of BMD reports [17]. More recently, Binkley and Krueger [16] determined that over 60 % of surveyed clinicians desired inclusion of information about fracture risk and pharmacological/nonpharmacological interventions on BMD reports [16]. However, if reported risk assessments are inaccurate (e.g., due to missing clinical risk factors) and are used to inform treatment recommendations, as demonstrated in the current study, there is the potential for inappropriate treatment decisions that would leave high-risk patients untreated.

It can be argued that the individuals for whom BMD results are perhaps most critical are those at “moderate” fracture risk. Treatment recommendations for this group are not straightforward [8, 20] when only BMD T-score or clinical risk factors are available. For example, in the current Osteoporosis Canada 2010 Guidelines for the Assessment of Fracture Risk [8], it is recommended that for this group, treatment should be individualized and may include pharmacologic therapy or just basic lifestyle measures with monitoring. It is further indicated that the moderate risk group requires a careful evaluation to identify vertebral fractures. In the current study, 31 % of the sample was incorrectly classified as low risk when their risk, given fracture history, would have been considered “moderate,” thereby placing them in this particularly vulnerable group.

Limitations

This study had a number of limitations. Reports were gathered from family physicians, as opposed to directly from reading specialists. We are assuming that family physicians relayed the BMD reports’ information precisely as it was relayed to them, but cannot guarantee this. For example, some reports may have contained attachments that were sent to family doctors, but not to the research team. In addition, as the majority of reports were produced in communities without academic health centers, their accuracy and adherence to standards may not reflect adherence or accuracy in other communities. The generalizability of our results is therefore strictly limited to BMD facilities in non-urban areas. Finally, only 25 % of the reports were for men, and less than 5 % were repeat reports for men. This complicates the ability to comprehensively assess standards and accuracy for this sub-group.

Methods used to compute fracture risk for the purpose of this study do not align with current recommendations but were chosen to reflect guidelines as they existed at the time reports were produced (2008). While the research team used the lowest T-score from the spine, total hip, or femoral neck to assess fracture risk, 2011 recommendations are to use the T-score from the femoral neck alone. Accuracy in assessment of surveyed reports relative to the 2008 standard may therefore be slightly different than accuracy relative to the current standard. Moreover, the research team assumed that risk assessments should be present on both baseline and follow-up reports, even though some ambiguity existed in 2008 as to whether risk assessments were appropriate for treated individuals. We note that most reports (87.5 %) included a risk assessment, although the proportion of follow-up reports (81.0 %) with an assessment is somewhat lower than the proportion of baselines with an assessment (92.6 %) potentially due, at least in part, to this ambiguity.

Summary

The current study highlights a quality gap in Ontario’s BMD reports produced in non-urban centers of Ontario in 2008, in which major clinical risk factors (i.e., history of recent fracture) are not reflected in fracture risk assessments. This has implications in terms of risk categorization and subsequent follow-up care and treatment recommendations particularly for fracture patients who are at moderate or high risk for future fractures. The findings of the present study suggest that inaccuracies in BMD reporting may result in under-treatment of patients at high risk for future fracture.