Automatic opportunistic osteoporosis screening in routine CT: improved prediction of patients with prevalent vertebral fractures compared to DXA

Objectives To compare spinal bone measures derived from automatic and manual assessment in routine CT with dual energy X-ray absorptiometry (DXA) in their association with prevalent osteoporotic vertebral fractures using our fully automated framework (https://anduin.bonescreen.de) to assess various bone measures in clinical CT. Methods We included 192 patients (141 women, 51 men; age 70.2 ± 9.7 years) who had lumbar DXA and CT available (within 1 year). Automatic assessment of spinal bone measures in CT included segmentation of vertebrae using a convolutional neural network (CNN), reduction to the vertebral body, and extraction of bone mineral content (BMC), trabecular and integral volumetric bone mineral density (vBMD), and CT-based areal BMD (aBMD) using asynchronous calibration. Moreover, trabecular bone was manually sampled (manual vBMD). Results A total of 148 patients (77%) had vertebral fractures and significantly lower values in all bone measures compared to patients without fractures (p ≤ 0.001). Except for BMC, all CT-based measures performed significantly better as predictors for vertebral fractures compared to DXA (e.g., AUC = 0.885 for trabecular vBMD and AUC = 0.86 for integral vBMD vs. AUC = 0.668 for DXA aBMD, respectively; both p < 0.001). Age- and sex-adjusted associations with fracture status were strongest for manual vBMD (OR = 7.3, [95%] CI 3.8–14.3) followed by automatically assessed trabecular vBMD (OR = 6.9, CI 3.5–13.4) and integral vBMD (OR = 4.3, CI 2.5–7.6). Diagnostic cutoffs of integral vBMD for osteoporosis (< 160 mg/cm3) or low bone mass (160 ≤ BMD < 190 mg/cm3) had sensitivity (84%/41%) and specificity (78%/95%) similar to trabecular vBMD. Conclusions Fully automatic osteoporosis screening in routine CT of the spine is feasible. CT-based measures can better identify individuals with reduced bone mass who suffered from vertebral fractures than DXA. Key Points • Opportunistic osteoporosis screening of spinal bone measures derived from clinical routine CT is feasible in a fully automatic fashion using a deep learning-driven framework (https://anduin.bonescreen.de). • Manually sampled volumetric BMD (vBMD) and automatically assessed trabecular and integral vBMD were the best predictors for prevalent vertebral fractures. • Except for bone mineral content, all CT-based bone measures performed significantly better than DXA-based measures. • We introduce diagnostic thresholds of integral vBMD for osteoporosis (< 160 mg/cm3) and low bone mass (160 ≤ BMD < 190 mg/cm3) with almost equal sensitivity and specificity compared to conventional thresholds of quantitative CT as proposed by the American College of Radiology (osteoporosis < 80 mg/cm3). Supplementary Information The online version contains supplementary material available at 10.1007/s00330-020-07655-2.


Introduction
Osteoporosis is a metabolic bone disease characterized by impaired bone strength, predisposing the individual to an increased risk of fracture [1]. Osteoporosis affects the population worldwide, particularly the elderly in developed countries [2]. In the European Union, the economic burden of osteoporotic fractures has been estimated at 37 billion euros per year and is expected to increase by 25% in 2025 [3].
Besides hip fractures, vertebral fractures are the most common and most consequential osteoporotic fractures [4]. Their prevalence among Europeans older than 50 years ranges between 18 and 26% [5]. Vertebral fractures have dramatic consequences that include a reduced quality of life [6], a 2-fold increase in age-adjusted mortality risk [7], and a 3-fold increase in the risk of additional fractures compared to the normal population, respectively [8]. All types of osteoporotic fractures in the elderly foreshadow a high risk of poor outcomes, so that early medical intervention is strongly advised [9]. Medical treatment can specifically target patients with a very high risk profile and long-term management is generally required [10].
The main problem of osteoporosis is that osteoporotic patients remain asymptomatic until a fracture occurs. Moreover, osteoporotic vertebral fractures remain clinically silent with only 15-30% coming to clinical attention [11]. Thus, the primary aim in osteoporosis care is to identify people at high risk of fractures in order to initiate medical treatment before the first fracture occurs. To date, the standard screening method includes assessing clinical risk factors and measuring areal bone mineral density (aBMD) using dual-energy X-ray absorptiometry (DXA) [1]. However, there are two major concerns with this approach. First, less than half of women (44%) and even fewer men (21%) with osteoporotic fractures exhibited low aBMD in a large observational study [12], emphasizing the inherent inaccuracies of DXA [13]. Second, there is significant variability in the access to DXA services and many fall short of international quality standards [14]. Yet, other methods of bone densitometry exhibit even more disadvantages: quantitative computed tomography (QCT) has limited availability, is more expensive, and is associated with a substantially higher radiation dose (> 100-fold) [15]. Thus, an alternative method for osteoporosis screening that would be readily available and exhibits a higher accuracy than DXA in predicting major osteoporotic fractures is highly warranted.
With the advent of sufficient computational power "deep learning", an approach to machine learning using layers of convolutional neural networks (CNNs), has lately become popular. Specifically, CNNs can increase efficiency and accuracy in segmentation tasks. We recently introduced a framework for fully automatic segmentation of vertebrae in any CT dataset within several seconds [16,17]. This was a cornerstone for the implementation of an opportunistic screening tool that can extract spinal bone measures from any CT data in a fully automatic fashion. Opportunistic quantitative evaluation of preexisting clinical routing CT entails neither additional costs nor radiation exposure [15]. Building on this groundwork, we now aim to proof the concept of opportunistic osteoporosis screening using our fully automated framework (https://anduin.bonescreen.de) to assess various bone measures in clinical CT and to investigate their predictive value for vertebral fracture assessment.
The purpose of this study was to systematically compare the association between prevalent osteoporotic vertebral fractures and various measures of spinal bone mass, extracted from clinical routine CT both automatically and manually, with the reference standard of DXA.

Study population
The local institutional review board approved this monocentric retrospective study (ethics committee's reference number 27/19S/SR) and waived written informed consent. In a query on all patients registered until May 2017 in the institutional database, we identified 360 patients who had DXA and CT available including parts of the thoracolumbar spine. The maximum interval between DXA and CT exams was defined as 12 months. We excluded patients with a history of vertebral metastasis or hematologic disorders (n = 18), without assessable lumbar DXA (n = 34), without assessable CT (due to visualization of fractured vertebrae only, tube voltage other than 120 kV, or severely limited image quality; n = 15), and patients younger than 50 years at the time of DXA examination (n = 35). CT scans of the remaining 258 patients were screened for prevalent osteoporotic vertebral fractures using the semi-quantitative technique by Genant [18]. Based on visual image review, patients were categorized either as fractured (if grade ≥ 1) or non-fractured. To enable a correct fracture classification and not miss a fracture that was not visualized due to partial coverage of the spine in the CT scan, nonfractured patients were excluded from the study if not at least vertebral levels T7 to L4 were visualized (n = 66). This yielded a final study group of 192 patients, with 148 patients (77%) showing at least one prevalent osteoporotic vertebral fracture.

CT image acquisition
CT scans were performed on six different multidetector CT scanners (Philips Brilliance 64, iCT 256, and IQon, Philips Medical Systems; Siemens Somatom Definition AS, Somatom Definition AS+, and Somatom Sensation Cardiac 64, Siemens Healthineers); some scans were performed after administration of either both oral (Barilux Scan, Sanochemia Diagnostics) and intravenous (Iomeron 400, Bracco) contrast medium or only intravenous contrast material (n = 61). Image data were acquired with all scanners in helical mode with a peak tube voltage of 120 kVp, a slice thickness of 0.9-1 mm, and adaptive tube load. Post-contrast scans were acquired in either the arterial or portal venous phase, triggered by a threshold of CT attenuation surpassed in a region of interest placed in the aorta or after a delay of 70 s, respectively, depending on the clinical indication for CT imaging. Sagittal reformations of the spine with 1-, 2-, or 3-mm slice thickness were reconstructed with a bone kernel and used for further analysis in this study. Imaging was performed for various indications not related to bone densitometry: acute back pain or suspected spinal fracture (n = 86); cancer staging, restaging, or followup (n = 55); exclusion of acute abdominal pathology (n = 21); chronic back pain (n = 14); and postoperative examination (n = 16).

Dual-energy X-ray absorptiometry
Areal BMD of lumbar vertebrae L1 to L4 was assessed in anterior-posterior projection on a DXA scanner (GE Lunar Prodigy, GE Healthcare). Scans were performed by trained technologists and quality was assured through evaluation by experienced physicians following current recommendations [19]. Those skeletal sites affected by severe local structural changes or artifacts were excluded. T-Scores were calculated in relation to a reference population of healthy young women who are at their peak bone mass. The overall lowest T-score at the lumbar spine was reported and accounted for the diagnosis of osteoporosis [20]. Osteoporosis was defined as T ≤ − 2.5 SD and low bone mass as − 2.5 < T ≤ − 1 SD [21].

Opportunistic CT-based measurements of bone mass
Volumetric and areal measures of bone mass were extracted from clinical CT scans in at least one of vertebrae T12 to L4. Measurements were averaged in case multiple levels could be evaluated.

Asynchronous calibration and correction for contrast medium
CT attenuation in Hounsfield units (HU) was converted to volumetric BMD using asynchronous calibration. In asynchronous calibration, phantoms with elements of boneequivalent density are scanned to calculate HU-to-BMD relations that are specific for a certain CT scanner and acquisition protocol. Previously published HU-to-BMD conversion equations were used for all CT scanners in this study [22]. Most of these conversion equations were established in scans of a phantom with hydroxyl-apatite inserts of known density in milligrams per cubic centimeter (Anthropomorphic Abdomen Phantom, QRM Quality Assurance in Radiology and Medicine). Bias of BMD values due to intravenous injection of contrast medium was corrected for using linear correction equations for arterial and portal/venous contrast phases [23]. HU values were converted to BMD and corrected for the presence of contrast medium prior to any subsequent evaluation of CT data.

Automatic extraction of volumetric bone measures
Volumetric measures were extracted in an automatic multistep procedure, which required minimal user interaction and was implemented in Python. First, vertebrae were automatically segmented in CT scans using a framework of CNNs that identifies the spine, labels each vertebral body, and creates segmentation masks [16]. Second, vertebral bodies were separated from posterior elements in these masks using affine and deformable transformations to fit templates of vertebral subregions to each vertebral level. Third, segmentation masks of vertebral bodies were used to extract integral vBMD and bone mineral content (BMC) or additionally eroded by 5 mm to exclude cortical bone for sampling trabecular vBMD.

CT-Based areal BMD
Areal BMD was extracted from virtual DXA-equivalent scans created from CT data (CT-based aBMD) for vertebrae L1 to L4. Only bony tissue inside the vertebral segmentation masks was included in the virtual images created in posterior-anterior projection. We chose this approach to take advantage of the 3dimensional character of CT scans compared to DXA, thus postulating its superior accuracy notwithstanding the fact that it is a monoenergetic technique. Areal BMD was sampled from the posterior-anterior projections in overlay masks corresponding to the contour of vertebral bodies, thus excluding lateral processes (Fig. 1). Good correlation between CT-based and DXA-based aBMD of L2 and L3 (R 2 = 0.814 and R 2 = 0.739, respectively) could be shown for a sample group of 29 patients (22 women, mean age 61.5 ± 13.6 years; Suppl. Fig.  1). Bland-Altman plots showed a bias of − 0.054 and − 0.015 g/cm 2 at L2 and L3, respectively, for CT-based aBMD ( Supplementary Fig. 1). Thus, CT-based assessment seemed to slightly underestimate aBMD compared to DXA.

Quality assurance in evaluation survey
Curved planar reconstructions (CPRs) in sagittal and coronal view passing through the centroids of vertebral bodies were generated from CT data and overlaid with segmentation masks at 40% opacity. Additionally, virtual radiographs in lateral projection were calculated from CT data. These image reconstructions served as a survey to identify vertebral levels that had to be excluded from bone mass assessment due to (1) vertebral fractures, (2) degenerative changes, or (3) other abnormalities (e.g., foreign material) that led to alterations in bone mass not specific to osteoporosis (Figs. 2 and 3).

Clinical thresholds for volumetric BMD measures
For trabecular vBMD, we used the diagnostic thresholds for osteoporosis (BMD < 80 mg/cm 3 ) and for low bone mass (80 ≤ BMD ≤ 120 mg/cm 3 ) proposed by the American College of Radiology (ACR) [24]. For integral vBMD, we developed new diagnostic thresholds in relation to the cut points for trabecular vBMD. Therefore, we compared the coordinate points in receiver operating characteristics (ROC) analysis between trabecular and integral vBMD and determined those points for integral vBMD with the smallest geometrical distance to the respective cut points of trabecular vBMD, thus yielding sensitivity and specificity that matched most closely for both measures. Cutoff values in milligrams per cubic centimeter were rounded to the nearest step of 5 mg/cm 3 .

Statistical analysis
Study group characteristics were compared between patients with and without prevalent vertebral fractures using a twosample t test for continuous variables and a chi-squared test of independence for sex. We investigated the association between different bone measures and prevalent fracture status in logistic regression, calculating odds ratios (ORs) and 95% confidence intervals (CIs) for one SD change. Models were additionally adjusted for age and sex. Area under the curve (AUC) was calculated in ROC analysis to test the classification performance of all bone measures to predict prevalent osteoporotic vertebral fractures. ROC curves were compared with DeLong's test for two correlated ROC curves using the pROC package [25]. Statistical analyses were conducted using SPSS (version 26; IBM) and RStudio (version 1.3.1073; RStudio). Statistical significance was set at a level p < 0.05 for all statistical tests.
Prevalent vertebral fractures were significantly associated with all DXA-and CT-based bone measures irrespective of adjustment for age and sex (Table 2). However, there were considerable differences with stronger associations for all CTbased measures (ranging from OR = 2.5, 95% CI 1.7-3.9 for adjusted CT-based aBMD to OR = 7.3, 95% CI 3.8-14.3 for adjusted manual vBMD) compared to DXA-based measures (OR = 1.9, 95% CI 1.3-2.8 each for adjusted DXA-based Tscore and aBMD) and for both adjusted and unadjusted ORs ( Table 2).
Diagnostic thresholds were determined for integral vBMD that define osteoporosis with BMD < 160 mg/cm 3 and low bone mass with 160 ≤ BMD < 190 mg/cm 3 . Those cut points had almost equal sensitivity and specificity to predict patients with prevalent vertebral fractures compared to trabecular vBMD (84% vs. 86% sensitivity and 78% vs. 78% specificity for the osteoporosis threshold as well as 41% vs. 41% sensitivity and 95% vs. 98% specificity for the low bone mass threshold, respectively).

Discussion
All automatically assessed CT-based bone measures had a highly significant association with the prevalence of osteoporotic vertebral fractures with no significant differences between automatic and manual measurements. Except for BMC, all CT-based bone measures showed significantly better discriminatory power for the prevalence of vertebral fractures compared to DXA-based measures.
We reported on elderly patients that all received DXA scans, thereby implicating that osteoporosis was already suspected; thus, the high prevalence of at least mild (Genant grade 1) osteoporotic vertebral fractures (77%) is not surprising. These differences in the study population-paired with Statistically significant values are in italics; n.s., non-significant at p < 0.05   [29]. In contrast to a community-dwelling population like the MrOS cohort, our study group has a selection bias of elderly hospital inpatients, mainly neurosurgical and oncological and exhibiting severe spinal degeneration that render areal density measures inaccurate [30]. In this context, BMC may become even more inaccurate, as outlined before.
Looking 25 years back in time, the insight that trabecular vBMD (QCT; AUC = 0.81) offers better discriminatory power for the prevalence of vertebral fractures than aBMD (DXA; AUC = 0.65) appears familiar [31]. Here, we were able to reproduce these results on modern scanner hardware. In this regard, a recently presented approach to directly estimate aBMD from CT scans using CNNs trained on DXA and CT data is questionable because it propagates the inaccuracies of DXA to CT measures [32]. Previously, efforts to automatically assess BMD in CT data have been undertaken [33]. Some automatic tools use HU as a proxy for BMD [34], which is a method that is expected to produce inaccuracies due to its lack of scanner-specific calibration to bone [35] as well as high variations due to presence of contrast material [15,36,37]. Of note, automatic assessment of other CT-derived biomarkers such as muscle attenuation has shown potential to predict fragility fractures [38]. In contrast to these previous studies, we report on calibrated bone measures (aBMD, vBMD, or BMC) that were fully automatically extracted using fast and reliable CNNs. Using an earlier version of this automatic framework we were able to predict screw loosening after lumbar spinal instrumentation in patients with osteoporotic trabecular vBMD [40]. Given that integral vBMD performed almost as good as trabecular vBMD, it would be convenient to have diagnostic thresholds available for integral vBMD that define osteoporosis and low bone mass similar to those defined by the ACR for trabecular vBMD [24]-an idea  that has been previously proposed [27]. Here, we developed thresholds of integral vBMD for osteoporosis and for low bone mass. These diagnostic thresholds should be validated in follow-up studies investigating fracture risk because we did not report on incident vertebral fractures.
There are limitations to this retrospective study. As mentioned before, there was a selection bias of elderly and mainly neurosurgical and oncological patients because it was required that they had received both multidetector CT and DXA within 1 year. Thus, osteoporosis was already suspected. However, this is exactly the population that could benefit from opportunistic osteoporosis screening because CT scans already exist and DXA scans become prone to inaccuracies due to spinal degeneration. Moreover, oncological patients would particularly benefit from opportunistic screening because osteoporosis may occur as a side effect of cancer treatment [39].
In conclusion, this study showed that opportunistic and fully automatic assessment of areal and volumetric bone measures in clinical routine CT scans is feasible. Volumetric and integral vBMD showed the best performance of these automatic measures to predict vertebral fractures. DXA-based and non-volumetric measures performed relatively worse. Finally, we propose newly developed diagnostic thresholds of integral vBMD for osteoporosis (< 160 mg/cm 3 ) and low bone mass (160 ≤ BMD < 190 mg/cm 3 ) that should be validated in upcoming studies.
Funding Open Access funding enabled and organized by Projekt DEAL. This study has received funding by the European Research Council (ERC; starting grant No. 637164 "iBack" to Jan S. Kirschke) and from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation; project No. 432290010).

Compliance with ethical standards
Guarantor The scientific guarantor of this publication is Jan S. Kirschke.

Conflict of interest
The authors of this manuscript declare no conflict of interest.
Statistics and biometry One of the authors has significant statistical expertise.
Informed consent Written informed consent was waived by the Institutional Review Board.
Ethical approval Institutional Review Board approval was obtained.
Study overlap or cohorts overlap Thirty-three patients were included in a prior study comparing manual BMD measurements in CT with DXA in the prediction of incident vertebral fractures [Löffler et al 2019]. Moreover, CT scans of 29 patients were previously published as part of a vertebral segmentation dataset with fracture grading .

Methodology
• retrospective • diagnostic or prognostic study • performed at one institution Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.