Introduction

Dual-energy X-ray absorptiometry (DXA) is currently the most accepted method to measure and longitudinally assess bone mineral density (BMD). It is the world-wide “gold standard” for clinical trials evaluating changes in patient BMD following therapeutic intervention. The mineral content of skeletal tissue usually diminishes at a slow annual pace: 0.5% to 2% for most healthy adults and 2% to 5% for women in early postmenopause [1, 2]. The increase in BMD following treatment with currently available antiresorptive drugs is about 1% to 8% during the course of 3 years.

We have previously shown that independent of the technologist, there are differences in short-term precision between the Hologic Delphi (Waltham, MA, USA) and GE Lunar Prodigy (Madison, WI, USA) DXA devices [3]. Yet, the precision and accuracy of DXA scanner and software are the key issues when interpreting BMD measurements in clinical practice and clinical drug trials. Lower precision errors allow for easier and earlier detection of significant changes in BMD.

Former DXA software updates have been known to affect the accuracy of DXA measurements. For example, in pediatric whole body scans, the Hologic Discovery v12.1 software changed bone detection threshold from a fixed threshold to a variable one related to patient mass. This was done to better delineate low density bones (i.e., feet and hands).

Awareness of software updates is especially relevant during ongoing clinical trials, particularly those with longitudinal design, because it is crucial that software versions and their updates are comparable. Using the same DXA software throughout the duration of a clinical trial obviates concern regarding software-associated measurement changes. However, new software may offer potential benefits if it is more reliable and more precise.

Hologic has recently introduced the new Apex v2.0 software, with an analysis algorithm that is intended to be more precise than the previous Delphi v11.2 software.

Given the above concerns about new software, we hypothesized that there may be precision differences between the Apex, Delphi, and Prodigy software systems. Because potential differences may affect diagnostic and clinical trial outcomes, this study was undertaken to directly compare precision (test/retest reproducibility) between these three programs. To this end, we compared images of the same patients taken by the same technologists. Additionally, we evaluated the BMD agreement at the hip and anterior-posterior (AP) lumbar spine between the Hologic Apex and Delphi programs.

Materials and methods

Study population

The three study facilities involved in this study were (1) Facility 1, New Mexico Clinical Research & Osteoporosis Center, Albuquerque, NM, USA; (2) Facility 2, Colorado Center for Bone Research, Lakewood, CO, USA; and (3) Facility 3, University of California at San Francisco (UCSF), San Francisco, CA, USA. Each study facility recruited 30 women, ages 52 to 85 years (mean age 63.3 ± 9.2), for a total of 90 subjects. Three participants’ scans were lost due to corrupted scan files. A total of 87 women’s scan results were included in this report. The local human research committee for each facility approved the study, and subjects signed an approved informed consent prior to participating. There were no subject restrictions on ethnicity or body mass.

Bone densitometry

All participants were scanned twice on both Hologic Delphi (Hologic, Inc., Waltham, MA, USA) and GE Lunar Prodigy (Madison, WI, USA) DXA systems using each manufacturer’s standard scan and positioning protocols. Spine phantom quality control scans were acquired on each of the six systems on a continual basis during the study, and no cross calibration was performed for any of the systems. Each patient was positioned for the lumbar spine scan and then the left and right proximal femur scans; repositioning was done between scans. The 30-second scan mode was used on both systems and for all positions. The legs were elevated using the Hologic positioning cushion for spine scans on the Hologic systems; legs were flat on the table for the femur scans. Foot straps were used to stabilize the leg being scanned. The dual-hip/dual-femur and spine-flat methods were used to scan the subjects on the GE Lunar system, except one study facility (UCSF), where the single-hip/femur mode was used to scan both hips.

Scan analysis

Using the methods recommended by each manufacturer, one technologist at each facility analyzed the images. All three technologists were certified by the International Society for Clinical Densitometry (ISCD). A detailed description regarding the regions of interest (ROI) definitions for the Prodigy and Delphi systems are described by Shepherd et al. [3]. The “compare” (Delphi) or “copy” (Prodigy) methods were used to analyze the repeat measurements, thereby facilitating consistent placement of analysis regions for each subject. As with the Hologic Delphi analysis, the Hologic scans were reanalyzed with the Hologic Apex software using global ROI on both hip and spine images. Bone mapping can vary slightly between Apex and Delphi software. Based on quality control review, during analysis of the Apex software, minor corrections were made on several neck box placements.

Data conversion and statistical analysis

Demographics and other characteristics of the study population were calculated as means and standard deviation (SD). The relationship between Apex and Delphi software was defined using linear regression. The BMD values from both systems were converted into sBMD units using the Hui et al. formulas for spinal BMD [4] and the Lu et al. formulas for femur BMD [5].

The sBMD equations were derived from scans acquired on previous generation systems: Hologic QDR-2000 and Lunar DPX-L models using the pencil bean scan modes. Our previous work showed that the published relationships remain valid for removing the systematic bias between the Delphi and the Prodigy [3]. Repeat scans assessed the in vivo short-term precision of the BMD measurements expressed as a root-mean-square standard deviation (RMS-SD) and root-mean-square percent coefficient of variance (RMS-%CV) [6].

The Fisher test for equality of correlated was used to test the difference in measurement errors between these software versions for each measurement site. The F test was used to test the difference between the study sites for all parameters studied. All statistical analyses were completed using SAS software 9.1 (Cary, NC, USA) with a significance level of 0.05. Bland–Altman regression was used to test for significant differences of analysis results between the Delphi and Apex software.

Results

Table 1 shows the characteristics of study populations at the three participating study Facilities. There were no statistically significant differences noted in terms of age, height, and weight among the three facilities.

Table 1 Subject characteristics

The results of Apex were highly correlated with Delphi for all the BMD measurements. Correlation coefficients (Table 2) ranged from 0.99 (lumbar spine) to 0.95 (femoral neck). There were no significant differences in intercepts between Apex and Delphi software for lumbar spine and femoral neck BMD (Table 2), although in some cases, there were small but statistically significant differences in mean values. Figures 1, 2 and 3 show the Bland–Altman plots of Apex versus Delphi for the BMD measurements. The Apex software demonstrated statistically significant precision improvements (using the Fisher test for equality) for all ROIs compared with Delphi, except for the femoral neck ROI (Fig. 4). The improvements ranged from 20% to 25%.

Fig. 1
figure 1

Bland-Altman plots of lumbar spine BMD Hologic Apex and Delphi software. The solid line is the best fit line with the 95% confidence limits shown as dotted lines

Fig. 2
figure 2

Bland-Altman plots of left and right femoral neck BMD Hologic Apex and Delphi software. The solid line is the best fit line with the 95% confidence limits shown as dotted lines

Fig. 3
figure 3

Bland-Altman plots of left and right total hip BMD Hologic Apex and Delphi software. The solid line is the best fit line with the 95% confidence limits shown as dotted lines

Fig. 4
figure 4

Comparing precision of Hologic Apex and Delphi software for lumbar spine and left and right hip regions

Table 2 Comparison of hologic apex and delphi software results

Precision errors of Apex and Prodigy for each measure in terms of SD and CV are listed Table 3. No one facility consistently had better precision than the other two. For each manufacturer, the left femoral neck precision was better than the right (Apex: 1.6% vs. 2.3% (p < 0.01); Prodigy: 1.5% vs. 1.8% (p = 0.06)). The Dual-hip/dual-femur precision errors were superior to either single-hip precision (Apex: 0.7% vs. 0.9%; Prodigy, 0.6% vs. 0.9%, respectively). We speculate that this could be due to the left hip being on the same side as the technologist and easier to manipulate and position than the right femur. However, the technologist did not report more difficulty with performance for the right versus the left hip.

Table 3 RMS standard deviation (RMS SD) and coefficient of variation (RMS CV) of hologic apex and GE healthcare lunar prodigy software

Lastly, in a previous study we found statistically significant differences in precision between Delphi and Prodigy at all skeletal sites [3]. In the current study, there were no differences between Apex and Prodigy precision errors except for the right femoral neck where the Prodigy had a better precision than the Apex.

Discussion/Conclusion

In this study we investigated the precision errors of lumbar spine and proximal femur scans using updated the Hologic Apex v2.0 software compared to the Hologic Delphi v11.2 and the GE Lunar Prodigy v7.5 software. Our results show that the Apex algorithms have significantly lower precision error than the Delphi using the same scans, independent of technologists. Further, we found no significant differences between the precision errors of the Apex and Prodigy, except at the right femoral neck. These results demonstrate that precision can be improved by algorithm development, suggesting that further improvements may not be limited to the positioning skills of the technologist.

A facility’s scan precision achievement affects the least significant change (LSC), which must be shown in order for a BMD difference between two scans to be considered statistically significant. For example, a 20% decrease in precision error directly reduces the LSC, and the monitoring time interval (MTI) [7], the time one has to wait to expect that a LSC has occurred, by 20%. When changing hardware, the ISCD recommends that “a repeat precision assessment should be done if a new DXA system is installed” [8]. The findings of this study suggest that upgrading from Hologic Delphi to Apex software should be considered equivalent to a new DXA system. Thus, a precision assessment should be repeated, and the LSC recalculated.

BMD differences of 1% to 2% may occur in individual patients, depending on which software version is used. These differences may be negligible for diagnostic classification or assessment of fracture risk. But they must be considered when results from multiple DXA systems are pooled in clinical trials and when systems are upgraded for use in clinical practice or research.

Our study had several limitations. First, the population studied was limited to postmenopausal women. It is unknown whether these findings will apply to other populations, e.g., men, premenopausal women, or children. Second, short-term precision measured on the same day is typically used as a surrogate for the more clinically-relevant, but difficult to quantify, long-term precision. Long-term precision is most often found to be substantially worse and to be affected by factors not captured by same-day measurement [9]. Finally, our results are only applicable to these software versions tested and may not apply to subsequent updates in software.

In conclusion, we showed that BMD measurement with the Hologic Apex and Delphi software is highly correlated. The precision of the Apex is significantly improved compared to the Delphi, and except for the right hip, was indistinguishable from precision of the GE Lunar Prodigy software.