Introduction

CT-based texture analysis (TA) is a growing and representative subfield of radiomics and represents both a non-invasive and quantitative method for the assessment of medical images. Texture features (TFs) allow for quantitative characterization of image properties, such as uniformity, heterogeneity, and randomness, as well as repetitive image patterns [1]. Thus, TA enables the extraction of additional diagnostic, predictive, and prognostic information beyond what is visually perceptive [2, 3]. Recently, TA methods have been used in different radiological subfields, such as neuroimaging or musculoskeletal imaging providing additional information regarding diagnosis or patients’ outcomes in various diseases [2, 4].

The differentiation between benign (osteoporotic) vertebral fractures (VFs) on the one hand, and malignant VFs due to underlying metastases on the other hand, based on CT imaging only is a frequent challenge in clinical practice and of emerging importance within the aging population. Up to 10% of all cancer patients develop symptomatic bone metastases during the course of the illness [5], and vertebral metastases account for up to 39% of all bone metastases [6]. Benign and malignant VFs are characterized by microarchitectural deterioration of osseous tissue and decreased bone mineral density (BMD) with associated fracture risk, since the three-dimensional (3D) trabecular bone architecture is impaired in both entities [7,8,9,10,11]. All the more, the differentiation between benign and malignant VFs is crucial due to the very different clinical work-up pathways. Making this even more challenging, a history of malignancy does not necessarily imply a tumorous osseous infiltration with subsequent malignant fracture. Malignant entities are frequently associated with either cancer- or treatment-induced BMD reduction, and both the primary condition and antitumoral therapies may thus lead to increased bone fragility and as a consequence can cause benign VFs in patients with a history of cancer [12, 13].

Standard morphological CT is the most suitable imaging technique for clinical routine work-up of bony tissues in general as it provides high spatial resolution and the possibility of reformation in three dimensions [13], while CT-based TA enables a more detailed assessment of the trabecular bone microarchitecture [1]. Thus, the differentiation of benign and malignant VFs represents a promising clinical application of CT-based TA.

In recent years, the deep learning (DL) approach using layers of convolutional neural networks (CNNs) has become frequently applied in many different settings and was able to increase both efficiency and accuracy in segmentation tasks. A fully automated framework (https://anduin.bonescreen.de), which enables an instant segmentation of vertebrae in any CT data set, has recently been introduced, particularly for opportunistic osteoporosis screening and related applications [9, 14,15,16,17,18].

The aim of this study was to investigate the diagnostic performance of 3D TFs derived from clinical routine CT using a CNN-based segmentation framework in order to differentiate benign and malignant thoracolumbar VFs. We hypothesized that the use of CT-based 3D TFs could improve the differentiation between benign and malignant VFs.

Materials and methods

The utilized workflow of this study is illustrated in Fig. 1.

Fig. 1
figure 1

Flowchart illustrating the study’s workflow. Clinical routine MDCT scans of the thoracolumbar spine of patients with either malignant A or benign B VFs have been identified and were retrospectively included. In a second step, the DICOM data were converted into NIfTI-files and automated labelling and segmentation of the thoracolumbar spine (T1-L6) was done using a DL-based algorithm (A, B). Thereafter, TA was performed on the postprocessed images by calculating all TFs for the ROI corresponding to each segmented vertebral body (A, B). TF results were analyzed and compared between the patient cohort with malignant, respectively, benign VFs (A, B). DICOM Digital Imaging and Communications in Medicine, DL deep learning, MDCT multidetector CT, NIfTI Neuroimaging Informatics Technology Initiative, TA texture analysis, TF texture feature, and VF vertebral fracture

Subjects

In total, 409 consecutive patients (198 females, 211 males, mean age = 67.5 ± 16.3 years, age range = 17.9–94.3 years) who received a multidetector CT (MDCT) of the thoracolumbar spine with a standard clinical routine protocol between April 2010 and April 2020 at two institutions were retrospectively included. Inclusion criteria were (1) acquisition of MDCT of the spine of at least T1 to L5 and (2) at least one osteoporotic or metastatic VF of the thoracolumbar spine. Exclusion criteria were (1) age below 18 years, (2) traumatic VFs, (3) motion artifacts in imaging data, (4) previous spine surgery, (5) inflammatory processes with related bone marrow changes such as spondylodiscitis according to MRI findings, and (6) pregnant or breastfeeding women. The present study was approved by the Institutional Review Board of both institutions. Due to the retrospective study design, informed consent requirement was waived.

CT image acquisition

Image acquisition was performed in supine position using MDCT scanners (Brilliance 64, Ingenuity CT, Philips Healthcare; Somatom Definition AS+, Somatom Sensation Cardiac 64, Somatom Drive, Somatom Force, Siemens Healthineers). An initial scout scan was used for planning the field of view and subsequent helical scanning with a peak tube voltage of 120 kVp or 130 kVp, and adaptive tube load was acquired. Some of the scans were performed after the application of both oral (Barilux Scan, Sanochemia Diagnostics) and/or intravenous (Imeron 400, Bracco) contrast agent. Sagittal reformations of the spine with a slice thickness ≤ 3 mm were reconstructed with a standard bone kernel and were used for CNN-based segmentation and subsequent TA.

Reference standard to categorize fractures as benign or malignant

Standard of reference for malignant VFs was either the histological analysis if a biopsy of the vertebra was performed or PET-CT, scintigraphy, and MRI confirming metastatic bone disease. VFs were considered benign if fulfilling all of the following criteria: no positive biopsy result (biopsy could be absent or negative) and no malignancy criteria found at baseline and imaging follow-up (≥ 3 months). All images were assessed by three board-certified radiologists (SSG, ASG, and JSK with 4, 11, and 15 years of radiological experience, respectively). In unclear cases, consensus readings were performed.

Automated deep learning-based CT image segmentation

For image segmentation, the anonymized CT data were exported from the local PACS as Neuroimaging Informatics Technology Initiative (NIfTI) data. The vertebrae T1 to L6 were automatically segmented in the MDCT images using a DL-driven framework (https://anduin.bonescreen.de) [17]. The pipeline fully automatically identifies and labells each vertebra and creates corresponding segmentation masks. The generated labels and segmentation masks of all vertebrae were checked visually by two radiology residents (SCF and DS with 4 and 2 years of experience, respectively) in order to verify the CNN-based segmentations. Representative data sets of the thoracolumbar spine with corresponding annotations and segmentation masks are shown in Fig. 2.

Fig. 2
figure 2

Exemplary illustration of the labelling and segmentation process. a, d Sagittal reformations of the thoracolumbar spine (T1–L5) with a slice thickness of 2 mm and a standard bone kernel in a a patient with multiple osteoporotic fractures of T7, T9, T10, T12, and L1 and d a patient with a metastatic fracture of L5. b, e Annotation of all vertebrae from T1 to L5. c, f Automated labelling and segmentation of vertebrae T1 to L5 using the CNN-based framework

Texture analysis

TA was performed on the postprocessed image data sets. All TFs were calculated for the region of interest (ROI) corresponding to each segmented vertebral body. The selection of analyzed TFs was based on previous studies [1, 19, 20]. We extracted a total of eight TFs: two of them were global features (variance and skewness), also referred to as first-order statistical moments, which are computed by gray-level histogram analysis, two were second-order features (energy and entropy), which are based on gray-level co-occurrence matrix (GLCM) analysis, and four were higher-order features (SRE, LRE, RLN, and RP), based on gray-level run-length matrix (GLRLM) analysis, as previously described [1]. Table 1 illustrates the TFs extracted in this study together with descriptions of each quantified image property. In order to generate isotropic volumes of the image data sets necessary for comparable TA results, cubic interpolation was used. To prevent sparseness, gray-level quantization was performed using the normalized gray levels (scale 0–1) of the ROI corresponding to each vertebral body. All steps of the TA were performed with MATLAB (version R2021a; MathWorks Inc., Natick, MA, USA) using a modified version of a publicly available radiomics toolbox (https://github.com/mvallieres/radiomics) [21].

Table 1 Global (histogram-based), gray-level co-occurrence matrix (GLCM)-based, and gray-level run-length matrix (GLRLM)-based texture features and descriptions

Statistical analysis

Statistical analysis was performed with SPSS (version 28; IBM SPSS Statistics for macOS, IBM Corp., Armonk, NY, USA) using a two-sided level of significance of 0.05 for all statistical tests. Shapiro–Wilk test was performed to test for normal distribution of the data. Multivariate regression analysis adjusted for age and sex was performed for comparison of TFs between the benign and malignant fracture cohorts. The fracture status (benign versus malignant VF) was used as dependent variable, while the different TFs were independent variables. Due to the large number of parameters, the analyses were split into the following categories (based on previously published data [1, 20]): primary data (global features, energy, entropy); exploratory data (SRE, LRE, RLN, and RP).

Results

Comparison of benign and malignant fractured vertebrae T1 to L6

A total of 228 fractured vertebrae were analyzed in n = 187 patients with benign VFs (nT1 = 0, nT2 = 0; nT3 = 4, nT4 = 2, nT5 = 3, nT6 = 2, nT7 = 15, nT8 = 4, nT9 = 3, nT10 = 4, nT11 = 19, nT12 = 24, nL1 = 52, nL2 = 36, nL3 = 29, nL4 = 17, nL5 = 13, nL6 = 1), while n = 308 fractured vertebrae were evaluated in a total of n = 222 patients with malignant VFs (nT1 = 11, nT2 = 4, nT3 = 5, nT4 = 3, nT5 = 8, nT6 = 10, nT7 = 14, nT8 = 5, nT9 = 10, nT10 = 9, nT11 = 19, nT12 = 24, nL1 = 34, nL2 = 42, nL3 = 38, nL4 = 36, nL5 = 36). Skewnessglobal showed a statistically significant difference between the two groups (p = 0.017), while all other TFs (Varianceglobal, energy, entropy, SRE, LRE, RLN, and RP) showed no statistically significant difference between benign and malignant fractured vertebrae (Table 2).

Table 2 Comparison of TFs between the benign and malignant fracture group analyzing all fractured vertebrae from T1 to L5

Discussion

An automated CNN-based spine segmentation and extraction of TFs of thoracolumbar vertebral bodies in routine MDCT scans was performed, showing statistically higher values in benign VFs compared to malignant VFs analyzing the global TF Skewness. This finding suggests differences in microstructural bone changes between benign and malignant fractured vertebrae.

Dimension reduction and feature selection are commonly performed steps in TA. During the selection process, it needs to be ensured that the selected TFs fulfill certain criteria and are of high relevance [1]. GLCM- and GLRLM-derived TFs may be restricted to or averaged across directions and distances [22, 23]. The selection of reproducible TFs with high inter- and intra-reader agreement is a further common approach chosen [22, 24]. Furthermore, correlation analysis can be performed in order to identify and consecutively exclude highly correlating TFs for redundancy reduction purposes [22]. With the rise of machine learning capabilities, further feature reduction methods emerged, such as using a random forest classifier to identify VFs to optimize the number of TFs based on the Gini importance and classification performance in an exponential search [25]. In another study, a more clinically driven approach was chosen, selecting TFs based on a preceding long-term reproducibility analysis to identify TFs, which are particularly suitable for long-term comparisons. With this background, we consciously selected the most promising TFs as well as vertebral levels based on the following relevant findings from three previous studies: in a first study, TFs from the histogram as well as second-order TFs applied to CT scans were able to predict radiation-induced insufficiency fractures in patients undergoing radiation therapy for pelvic malignancies [20]. Analyzing three ROIs (L5, sacrum, both femoral heads), the authors identified L5-energy, and femoral head-skewness, as the significant parameters to stratify the risk of patients to develop radiation-induced insufficiency fractures in logistic regression analysis [20]. A second relevant study identified a total of six TFs of the thoracolumbar spine (Varianceglobal, entropy, SRE, LRE, RLN, and RP), which were long-term reproducible when characterizing gender-, age-, and region-specific vertebral bone microstructure on routine abdominal MDCT [1]. A third previous study found that level-specific volumetric BMD using opportunistic QCT is a significant classifier of incident VF status for both single vertebral levels from T1 to L5 as well as for all analyzed combinations of four consecutive vertebral bodies. [19].

To the best of our knowledge, this is the first study analyzing the diagnostic performance of 3D CT-based TFs using a CNN-based framework to differentiate benign and malignant VFs. A recently published study on CT radiomics developed and validated an automated algorithm for segmentation of fractured vertebrae on CT and evaluated the applicability of this algorithm in a radiomics prediction model to differentiate a total of 341 benign and malignant VFs from 158 patients [26]. The authors validated their algorithm on independent test sets and constructed a radiomics model predicting fracture malignancy on CT. Further, they compared the prediction performance between automated and human segmentations. An automated segmentation algorithm that showed comparable performance to human expert segmentations in a CT radiomics model was developed and validated in order to predict fracture malignancy [26]. Accordingly, the CNN-based segmentation algorithm that we used in the present study also has been proven to be valid and accurate in several previous studies [9, 14,15,16,17,18]. However, the study by Park et al. and our study have key differences. Firstly, the TFs chosen by the authors [26] are only to a small extent comparable to the TFs we chose in our study. Secondly, we also analyzed non-fractured vertebrae, as the CNN-based segmentation pipeline we used labells and segments both fractured and non-fractured vertebrae, while only fractured vertebrae were analyzed in this previous study [26]. Thirdly, our patient cohort was significantly larger (409 vs. 158 patients) [26].

Based on current literature, TF extraction using a DL-based pipeline may add important value to future routine clinical practice in cases in which VFs need to be classified further into benign or malignant fractures [1, 19, 26, 27]. This becomes particularly important for the individual diagnostic work-up of the patient, potentially resulting in further MRI, PET-CT imaging, or biopsy to initiate the adequate therapy without any delay.

Analyzing fractured vertebrae, the global TF skewness, which reflects the shape of gray-level distribution, showed a significant difference between the benign and malignant fracture group with higher values in benign fractures. Interestingly, in a previous study, this TF—analyzing the femoral head—was able to stratify patients into those who developed and those who did not develop an insufficiency fracture [20]. In summary, there are indications that the global TF skewness might be useful both in the field of differentiating benign, respectively, malignant VFs and in fracture prediction, e.g., the development of insufficiency fractures as discussed above.

This study has several limitations. Firstly, we used a retrospective study design with related potential selection and referral bias. However, we used a high-quality standard of reference for the differentiation between benign and malignant VFs. Secondly, we did not evaluate the potential impact of contrast agent on texture analysis, which of course might have an influence and needs to be analyzed in bigger study cohorts. Thirdly, we did not analyze any bone-related health data such as weight and height, body mass index, corticosteroid therapy, or smoking status in our study, which are known to have an influence on bone quality. Combined radiomics–clinical models might be superior compared with radiomics models alone, e.g., in predicting malignancy of VFs on CT [27]. Yet, the purpose of this study was to evaluate the diagnostic performance of CT-based 3D TFs using a CNN-based segmentation framework for the differentiation of benign and malignant VFs and not to investigate whether the use of clinical information might improve the algorithm’s performance. And as a last point, the analyzed set of TFs was limited in order to avoid statistical errors due to multiple testing.

Conclusion

In conclusion, DL-based extraction of global CT-based 3D TFs was feasible and the global TF skewness showed significant differences between patients with benign and those with malignant VFs of the thoracolumbar spine. This suggests that DL-based TF extraction for the classification of benign versus malignant VFs might add relevant diagnostic information and therefore enhance the clinical work-up of diagnostics of VFs.