Differentiation of benign and malignant vertebral fractures using a convolutional neural network to extract CT-based texture features

To assess the diagnostic performance of three-dimensional (3D) CT-based texture features (TFs) using a convolutional neural network (CNN)-based framework to differentiate benign (osteoporotic) and malignant vertebral fractures (VFs). A total of 409 patients who underwent routine thoracolumbar spine CT at two institutions were included. VFs were categorized as benign or malignant using either biopsy or imaging follow-up of at least three months as standard of reference. Automated detection, labelling, and segmentation of the vertebrae were performed using a CNN-based framework (https://anduin.bonescreen.de). Eight TFs were extracted: Varianceglobal, Skewnessglobal, energy, entropy, short-run emphasis (SRE), long-run emphasis (LRE), run-length non-uniformity (RLN), and run percentage (RP). Multivariate regression models adjusted for age and sex were used to compare TFs between benign and malignant VFs. Skewnessglobal showed a significant difference between the two groups when analyzing fractured vertebrae from T1 to L6 (benign fracture group: 0.70 [0.64–0.76]; malignant fracture group: 0.59 [0.56–0.63]; and p = 0.017), suggesting a higher skewness in benign VFs compared to malignant VFs. Three-dimensional CT-based global TF skewness assessed using a CNN-based framework showed significant difference between benign and malignant thoracolumbar VFs and may therefore contribute to the clinical diagnostic work-up of patients with VFs.


Introduction
CT-based texture analysis (TA) is a growing and representative subfield of radiomics and represents both a non-invasive and quantitative method for the assessment of medical Michael Dieckmeyer and Alexandra S. Gersing have contributed equally to this work.
images.Texture features (TFs) allow for quantitative characterization of image properties, such as uniformity, heterogeneity, and randomness, as well as repetitive image patterns [1].Thus, TA enables the extraction of additional diagnostic, predictive, and prognostic information beyond what is visually perceptive [2,3].Recently, TA methods have been used in different radiological subfields, such as neuroimaging or musculoskeletal imaging providing additional information regarding diagnosis or patients' outcomes in various diseases [2,4].
The differentiation between benign (osteoporotic) vertebral fractures (VFs) on the one hand, and malignant VFs due to underlying metastases on the other hand, based on CT imaging only is a frequent challenge in clinical practice and of emerging importance within the aging population.Up to 10% of all cancer patients develop symptomatic bone metastases during the course of the illness [5], and vertebral metastases account for up to 39% of all bone metastases [6].Benign and malignant VFs are characterized by microarchitectural deterioration of osseous tissue and decreased bone mineral density (BMD) with associated fracture risk, since the three-dimensional (3D) trabecular bone architecture is impaired in both entities [7][8][9][10][11].All the more, the differentiation between benign and malignant VFs is crucial due to the very different clinical work-up pathways.Making this even more challenging, a history of malignancy does not necessarily imply a tumorous osseous infiltration with subsequent malignant fracture.Malignant entities are frequently associated with either cancer-or treatment-induced BMD reduction, and both the primary condition and antitumoral therapies may thus lead to increased bone fragility and as a consequence can cause benign VFs in patients with a history of cancer [12,13].
Standard morphological CT is the most suitable imaging technique for clinical routine work-up of bony tissues in general as it provides high spatial resolution and the possibility of reformation in three dimensions [13], while CT-based TA enables a more detailed assessment of the trabecular bone microarchitecture [1].Thus, the differentiation of benign and malignant VFs represents a promising clinical application of CT-based TA.
In recent years, the deep learning (DL) approach using layers of convolutional neural networks (CNNs) has become frequently applied in many different settings and was able to increase both efficiency and accuracy in segmentation tasks.A fully automated framework (https:// anduin.bones creen.de), which enables an instant segmentation of vertebrae in any CT data set, has recently been introduced, particularly for opportunistic osteoporosis screening and related applications [9,[14][15][16][17][18].
The aim of this study was to investigate the diagnostic performance of 3D TFs derived from clinical routine CT using a CNN-based segmentation framework in order to differentiate benign and malignant thoracolumbar VFs.We hypothesized that the use of CT-based 3D TFs could improve the differentiation between benign and malignant VFs.

Materials and methods
The utilized workflow of this study is illustrated in Fig. 1.

Subjects
In total, 409 consecutive patients (198 females, 211 males, mean age = 67.5 ± 16.3 years, age range = 17.9-94.3years) who received a multidetector CT (MDCT) of the thoracolumbar spine with a standard clinical routine protocol between April 2010 and April 2020 at two institutions were retrospectively included.Inclusion criteria were (1) acquisition of MDCT of the spine of at least T1 to L5 and (2) at least one osteoporotic or metastatic VF of the thoracolumbar spine.Exclusion criteria were (1) age below 18 years, (2) traumatic VFs, (3) motion artifacts in imaging data, (4) previous spine surgery, (5) inflammatory processes with related bone marrow changes such as spondylodiscitis according to MRI findings, and (6) pregnant or breastfeeding women.The present study was approved by the Institutional Review Board of both institutions.Due to the retrospective study design, informed consent requirement was waived.

CT image acquisition
Image acquisition was performed in supine position using MDCT scanners (Brilliance 64, Ingenuity CT, Philips Healthcare; Somatom Definition AS+, Somatom Sensation Cardiac 64, Somatom Drive, Somatom Force, Siemens Healthineers).An initial scout scan was used for planning the field of view and subsequent helical scanning with a peak tube voltage of 120 kVp or 130 kVp, and adaptive tube load was acquired.Some of the scans were performed after the application of both oral (Barilux Scan, Sanochemia Diagnostics) and/or intravenous (Imeron 400, Bracco) contrast agent.Sagittal reformations of the spine with a slice thickness ≤ 3 mm were reconstructed with a standard bone kernel and were used for CNN-based segmentation and subsequent TA.

Reference standard to categorize fractures as benign or malignant
Standard of reference for malignant VFs was either the histological analysis if a biopsy of the vertebra was performed or PET-CT, scintigraphy, and MRI confirming metastatic bone disease.VFs were considered benign if fulfilling all of the following criteria: no positive biopsy result (biopsy could be absent or negative) and no malignancy criteria found at baseline and imaging follow-up (≥ 3 months).All images were assessed by three board-certified radiologists (SSG, ASG, and JSK with 4, 11, and 15 years of radiological experience, respectively).In unclear cases, consensus readings were performed.

Automated deep learning-based CT image segmentation
For image segmentation, the anonymized CT data were exported from the local PACS as Neuroimaging Informatics Technology Initiative (NIfTI) data.The vertebrae T1 to L6 were automatically segmented in the MDCT images using a DL-driven framework (https:// anduin.bones creen.de) [17].The pipeline fully automatically identifies and labells each vertebra and creates corresponding segmentation masks.The generated labels and segmentation masks of all vertebrae were checked visually by two radiology residents (SCF and DS with 4 and 2 years of experience, respectively) in order to verify the CNN-based segmentations.Representative data sets of the thoracolumbar spine with corresponding annotations and segmentation masks are shown in Fig. 2.

Texture analysis
TA was performed on the postprocessed image data sets.All TFs were calculated for the region of interest (ROI) corresponding to each segmented vertebral body.The selection of analyzed TFs was based on previous studies [1,19,20].We extracted a total of eight TFs: two of them were global features (variance and skewness), also referred to as first-order statistical moments, which are computed by gray-level histogram analysis, two were second-order features (energy and entropy), which are based on gray-level co-occurrence matrix (GLCM) analysis, and four were higher-order features (SRE, LRE, RLN, and RP), based on gray-level run-length matrix (GLRLM) analysis, as previously described [1].Table 1 illustrates the TFs extracted in this study together with descriptions of each quantified image property.In order to generate isotropic volumes of the image data sets necessary for comparable TA results, cubic interpolation was used.To prevent sparseness, graylevel quantization was performed using the normalized gray levels (scale 0-1) of the ROI corresponding to each vertebral body.All steps of the TA were performed with MATLAB (version R2021a; MathWorks Inc., Natick, MA, USA) using a modified version of a publicly available radiomics toolbox (https:// github.com/ mvall ieres/ radio mics) [21].

Statistical analysis
Statistical analysis was performed with SPSS (version 28; IBM SPSS Statistics for macOS, IBM Corp., Armonk, NY, USA) using a two-sided level of significance of 0.05 for all statistical tests.Shapiro-Wilk test was performed to test for normal distribution of the data.Multivariate regression analysis adjusted for age and sex was performed for comparison of TFs between the benign and malignant fracture cohorts.The fracture status (benign versus malignant VF) was used as dependent variable, while the different TFs were independent variables.Due to the large number of parameters, the analyses were split into the following categories (based on previously published data [1,20]): primary data (global features, energy, entropy); exploratory data (SRE, LRE, RLN, and RP).

Discussion
An automated CNN-based spine segmentation and extraction of TFs of thoracolumbar vertebral bodies in routine MDCT scans was performed, showing statistically higher  values in benign VFs compared to malignant VFs analyzing the global TF Skewness.This finding suggests differences in microstructural bone changes between benign and malignant fractured vertebrae.Dimension reduction and feature selection are commonly performed steps in TA.During the selection process, it needs to be ensured that the selected TFs fulfill certain criteria and are of high relevance [1].GLCM-and GLRLM-derived TFs may be restricted to or averaged across directions and distances [22,23].The selection of reproducible TFs with high inter-and intra-reader agreement is a further common approach chosen [22,24].Furthermore, correlation analysis can be performed in order to identify and consecutively exclude highly correlating TFs for redundancy reduction purposes [22].With the rise of machine learning capabilities, further feature reduction methods emerged, such as using a random forest classifier to identify VFs to optimize the number of TFs based on the Gini importance and classification performance in an exponential search [25].In another study, a more clinically driven approach was chosen, selecting TFs based on a preceding long-term reproducibility analysis to identify TFs, which are particularly suitable for long-term comparisons.With this background, we consciously selected the most promising TFs as well as vertebral levels based on the following relevant findings from three previous studies: in a first study, TFs from the histogram as well as second-order TFs applied to CT scans were able to predict radiation-induced insufficiency fractures in patients undergoing radiation therapy for pelvic malignancies [20].Analyzing three ROIs (L5, sacrum, both femoral heads), the authors identified L5-energy, and femoral head-skewness, as the significant parameters to stratify the risk of patients to develop radiation-induced insufficiency fractures in logistic regression analysis [20].A second relevant study identified a total of six TFs of the thoracolumbar spine (Variance global , entropy, SRE, LRE, RLN, and RP), which were long-term reproducible when characterizing gender-, age-, and regionspecific vertebral bone microstructure on routine abdominal MDCT [1].A third previous study found that level-specific volumetric BMD using opportunistic QCT is a significant classifier of incident VF status for both single vertebral levels from T1 to L5 as well as for all analyzed combinations of four consecutive vertebral bodies.[19].
To the best of our knowledge, this is the first study analyzing the diagnostic performance of 3D CT-based TFs using a CNN-based framework to differentiate benign and malignant VFs.A recently published study on CT radiomics developed and validated an automated algorithm for segmentation of fractured vertebrae on CT and evaluated the applicability of this algorithm in a radiomics prediction model to differentiate a total of 341 benign and malignant VFs from 158 patients [26].The authors validated their algorithm on independent test sets and constructed a radiomics model predicting fracture malignancy on CT.Further, they compared the prediction performance between automated and human segmentations.An automated segmentation algorithm that showed comparable performance to human expert segmentations in a CT radiomics model was developed and validated in order to predict fracture malignancy [26].Accordingly, the CNN-based segmentation algorithm that we used in the present study also has been proven to be valid and accurate in several previous studies [9,[14][15][16][17][18].However, the study by Park et al. and our study have key differences.Firstly, the TFs chosen by the authors [26] are only to a small extent comparable to the TFs we chose in our study.Secondly, we also analyzed non-fractured vertebrae, Table 2 Comparison of TFs between the benign and malignant fracture group analyzing all fractured vertebrae from T1 to L5 The associations between benign and malignant fractured vertebrae (T1-L5) and TFs were assessed using multivariable regression models adjusting for age and sex TF texture feature, VF vertebral fracture, SRE short-run emphasis, LRE long-run emphasis, RLN run-length non-uniformity, and RP run percentage *TF results are given as adjusted values for each fractured vertebra in both the benign and malignant fracture cohort.Results were rounded up to two respectively three decimal places as the CNN-based segmentation pipeline we used labells and segments both fractured and non-fractured vertebrae, while only fractured vertebrae were analyzed in this previous study [26].Thirdly, our patient cohort was significantly larger (409 vs. 158 patients) [26].
Based on current literature, TF extraction using a DLbased pipeline may add important value to future routine clinical practice in cases in which VFs need to be classified further into benign or malignant fractures [1,19,26,27].This becomes particularly important for the individual diagnostic work-up of the patient, potentially resulting in further MRI, PET-CT imaging, or biopsy to initiate the adequate therapy without any delay.
Analyzing fractured vertebrae, the global TF skewness, which reflects the shape of gray-level distribution, showed a significant difference between the benign and malignant fracture group with higher values in benign fractures.Interestingly, in a previous study, this TF-analyzing the femoral head-was able to stratify patients into those who developed and those who did not develop an insufficiency fracture [20].In summary, there are indications that the global TF skewness might be useful both in the field of differentiating benign, respectively, malignant VFs and in fracture prediction, e.g., the development of insufficiency fractures as discussed above.
This study has several limitations.Firstly, we used a retrospective study design with related potential selection and referral bias.However, we used a high-quality standard of reference for the differentiation between benign and malignant VFs.Secondly, we did not evaluate the potential impact of contrast agent on texture analysis, which of course might have an influence and needs to be analyzed in bigger study cohorts.Thirdly, we did not analyze any bone-related health data such as weight and height, body mass index, corticosteroid therapy, or smoking status in our study, which are known to have an influence on bone quality.Combined radiomics-clinical models might be superior compared with radiomics models alone, e.g., in predicting malignancy of VFs on CT [27].Yet, the purpose of this study was to evaluate the diagnostic performance of CT-based 3D TFs using a CNN-based segmentation framework for the differentiation of benign and malignant VFs and not to investigate whether the use of clinical information might improve the algorithm's performance.And as a last point, the analyzed set of TFs was limited in order to avoid statistical errors due to multiple testing.

Conclusion
In conclusion, DL-based extraction of global CT-based 3D TFs was feasible and the global TF skewness showed significant differences between patients with benign and those with malignant VFs of the thoracolumbar spine.This suggests that DL-based TF extraction for the classification of benign versus malignant VFs might add relevant diagnostic information and therefore enhance the clinical work-up of diagnostics of VFs.

Fig. 1
Fig.1Flowchart illustrating the study's workflow.Clinical routine MDCT scans of the thoracolumbar spine of patients with either malignant A or benign B VFs have been identified and were retrospectively included.In a second step, the DICOM data were converted into NIfTI-files and automated labelling and segmentation of the thoracolumbar spine (T1-L6) was done using a DL-based algorithm (A, B).Thereafter, TA was performed on the postprocessed

Fig. 2
Fig. 2 Exemplary illustration of the labelling and segmentation process.a, d Sagittal reformations of the thoracolumbar spine (T1-L5) with a slice thickness of 2 mm and a standard bone kernel in a a patient with multiple osteoporotic fractures of T7, T9, T10, T12, and