Human subjects and MDCT imaging
Ethics approval was obtained from the local ethics committee (11/5022A1). Due to the retrospective nature of the study, the need for informed consent was waived. Retrieved from our local database, we reviewed consecutive patients who received MDCT, in the time between February 2007 and February 2008, for reasons of cancer staging, restaging, or follow-up after surgical treatment or chemotherapy.
Inclusion criteria for the present study consisted of (1) patients older than 38, (2) a CT scan of the thoracolumbar spine including sagittal reformations, (3) a bone mineral phantom within the scan field, and (4) the absence of any diseases affecting the spine such as bone metastases, hematological disorders, or metabolic bone diseases other than osteoporosis. To definitively exclude spinal metastasis, we included only patients with available follow-up scans of the spine confirming the absence of bone metastases. In total, 154 patients were included in the study (males; n = 103 and females; n = 51). These oncologic patients had histologically proven neoplasms of the gastrointestinal tract (102), lymphatic system (20), urinary tract (8), respiratory tract (6), sarcoma (7), or other solid tumors (11). The majority of patients showed no signs of distant metastasis (92); a minority were lymphoma patients (20); in the remaining cases non-spinal, distant metastases were present (42). Due to the fact that all subjects underwent screening for cancer metastasis, intravenous contrast medium (Imeron 400; Bracco, Konstanz, Germany) was administered using a high-pressure injector (Fresenius Pilot C; Fresenius Kabi, Bad Homburg, Germany). Intravenous contrast medium injection was performed with a delay of 70 s, a flow rate of 3 ml/s, and a body weight–dependent dose (80 ml for body weight up to 80 kg, 90 ml for body weight up to 100 kg, and 100 ml for body weight over 100 kg). Furthermore, all patients received 1000 ml oral contrast medium (Barilux Scan; Sanochemia Diagnostics, Neuss, Germany). All images were acquired with a Siemens CT scanner (Somatom 128, Siemens Healthcare AG, Erlangen, Germany) with calibration phantom with two rods (Osteo Phantom, Siemens Healthcare AG, Erlangen, Germany).
A patient was diagnosed with established osteoporosis (FX) if an osteoporotic vertebral fracture was detected in the image (53 patients). According to the semiquantitative Genant classification, vertebrae with a height loss of more than 20% (grade 1) and the typical morphology of osteoporotic fractures were considered as fractured [12]. A total of 101 patients had no signs of osteoporotic vertebral fractures (noFX).
Bone mineral density
The calibration phantom values were used for Hounsfield units (HU) to vBMD conversion. To account for the contrast medium administered to all subjects, a linear conversion factor for portal-venous (PV) was applied (BMDQCT = 1.02 × BMDMDCT − 18.72 mg/ml), as proposed in [13]. The corrected vBMD value for each vertebra of each patient was computed by sampling all voxels within the respective trabecular compartment. Finally, the vBMD value of the thoracic, lumbar, and thoracolumbar spine was determined by averaging the mean vBMD values and standard deviation (SD) of their respective vertebrae. Additional to the global mean for each vertebral level, we extracted also skewness and kurtosis, which we refer to as global density features (BMD) for classification.
Global and local feature extraction
We extracted features on a global (i.e., vBMD) and local level (i.e., regional). Global features were extracted for the complete eroded vertebral body. Both density calculation and texture analysis were performed using the calibrated scans. Due to the linear conversion used for calibration, internal micro-architectures and morphological patterns described by the textural features remained independent from this calibration. To fully utilize the advantage of texture analysis locally, we defined 27 subregions as proposed by [14] of each vertebra of our spine template (TLSSM16) generated in [15]. The center of the largest sphere fitting in the vertebral mask was defined as the center point of the vertebral body. Additionally, we extracted surface points of the vertebral endplates (i.e., superior and inferior endplate points), which we projected to the center point. The given set of 3D points was used to compute the plane that best fits those points by minimizing the sum of the quadratic distance (perpendicular to the plane) between the plane and the points. The fit was performed by computing the eigenvectors associated with the distribution of the points. Using a combination of two eigenvectors as the orthonormal basis of the planes, we extracted three distinct planes: superior-inferior plane (i.e., fitted plane), anterior-posterior plane, and medial-lateral plane. We divided the largest fitted sphere into three parts to define superior (S), mid-transverse (T), and inferior (I) regions using the fitted transverse plane, and into lateral (L) and medial (M) regions using the defined sagittal plane. Coronally, the vertebral bodies were divided into thirds to define the anterior (A), mid-coronal (C), and posterior (P) region using the anterior-posterior plane. The posterior elements were separated from the vertebral body using the anterior-posterior plane fitted to the posterior border of the vertebral body, i.e., the anterior border of the spinal canal. This intersection resulted in 27 subregions, which are depicted in Fig. 1. We extracted density (regional volumetric bone mineral density (BMDr)) and texture features for each vertebra for all defined subregions using different texture analysis techniques. We computed simple statistical descriptors for those features using the mean, standard deviation, skewness, and kurtosis.
Pre-processing
Each vertebra of the thoracolumbar spine was localized and segmented by an automated algorithm based on shape model matching [16]. The corresponding vertebra of the spine template (TLSSM16) was then aligned to the segmented vertebra to define the vertebral subregions for texture analysis. More specifically, we first estimated a rigid motion (i.e., rotation and translation) which roughly aligned the TLSSM16 to the sample vertebra. Next, we fitted the vertebral body of the TLSSM16 to the vertebral body of the sample via affine transform, which adds anisotropic scaling. Once the registration pipeline was concluded, we could easily warp the defined subregions to the sample vertebra. To exclude the surrounding cortical shell and limit the analysis to the trabecular compartment, we eroded the resulting mask of the vertebral body by a sphere with a radius of 4 voxels.
The implementation of the registration procedure was based on the elastix framework [17]. Visual inspection has been conducted on the results of both segmentation and registration to check the accuracy of the intermediate results. In total, 11 vertebrae had to be excluded from the procedure due to incorrect segmentation (n = 9) or registration (n = 2). The reason for this failure seemed to be high-grade fractures (n = 6) or severe degeneration in fractured vertebrae (n = 3) and abnormalities of the posterior elements (n = 2).
Three-dimensional textures analysis
Haralick features of the 3D co-occurrence matrix (HAR)
The Haralick features (HAR) are a set of features computed on the gray-level co-occurrence matrix (GLCM), a joint histogram of which the elements describe the occurrence of two intensity levels of being neighbors at a certain offset [18]. The algorithm for the gray-level co-occurrence matrix used in this work was set to the following parameters: 16 bins, offset of 1, in 13 distinct directions which defined the GLCM. Thirteen different HAR were used, which are reported in the supplemental material and described in [3, 19]. However, the vicinity of 2 voxels is not uniquely defined. An element lying in the 3D space has six direct neighbors with whom it shares one face and 20 semi-direct neighbors, which result in 13 unique directions. To address such directional ambiguity, we compute the mean and standard deviation of the Haralick features (HAR) in each possible direction. These are called the angular mean and angular standard deviation, respectively [19]. Both the angular mean and standard deviation vectors were computed as descriptors of the textures in a region.
3D histograms of oriented gradients
Histograms of oriented gradients (HOG) [20] describe textural patterns based on the gradient information. The gradient of a volume is defined at a voxel v as the change of intensity between the neighbors of v in the axial, sagittal, and coronal planes. The difference in intensity in each direction generates a vector called gradient vector. Such a vector is computed for each voxel v. To compute HOG features, the gradient vector is projected on the 20 faces of an icosahedron (i.e., a 20-sided dice) built around the voxel v [20]. Each normalized projection generates a vector, the magnitude of which is binned in a histogram. The textural descriptor was estimated by summing over the histograms in a certain region. Additionally, the same procedure can be applied to the gradient itself, obtaining in this way the descriptors of second-order gradients.
3D local binary patterns
Local binary patterns (LBP) were first introduced in 2D [21] as a way to uniquely identify the specific displacement of intensities around a pixel, with the main advantage of being invariant to rotations. The original procedure comprised the readout of the intensity values around a circle centered on the pixel of interest in a binary fashion. If the surrounding pixel, value is bigger than the central pixel, it gets the value of 1 and otherwise 0. The extension to a 3D space required the development of a more complex procedure to readout values from a sphere surrounding a certain voxel and describe them in a compact and unique fashion. Such a procedure is based on spherical harmonics, a mathematical framework, which allows the approximation of functions defined on a sphere [22]. Additionally, to confer to the descriptor’s rotational invariance, as originally proposed in 2D, the kurtosis was computed on the distribution of sampled voxels. This resulted in feature maps for each voxel location to a higher dimensional vector representing the particular 3D texture surrounding the voxel. Two parameters were set for this descriptor: the radius or the sampling sphere r = 2, 3, and 4 voxels and the number of coefficients f = 3 used by the spherical harmonics. The higher the number of coefficients, the more patterns and textures can be represented.
The most direct way to use LBP for the analysis of textures in a region would be to look for the most common pattern in that region. However, this approach is sensitive to noise, which changes the coefficients of the higher frequencies. By clustering these vectors according to their similarity, we were less sensitive to noise [23]. More specifically, we clustered the extracted 3D LBP features using k-means with k = 2, 3, and 4. Each resulting cluster, represented by its respective mean, was used as a descriptor, along with its cardinality.
3D wavelet decomposition
The term wavelet refers to a signal having a wave-like oscillation with amplitude that increases from zero up to a certain value and then decreases back to zero. Similar to sinusoidal functions in classical Fourier analysis, wavelets can be used as a basis function in the decomposition of a complex signal [24]. Unlike Fourier analysis, however, the limited support of wavelets easily allows the modeling of local frequency variations (or textures, in the case of images).
More specifically, a discrete 3D signal (i.e., the CT image) is decomposed into the weighted sum of a high-frequency signal (H) and a lower one (L) in each direction. This procedure generates eight sub-bands of one-eighth the size of the original volume (HHH, HHL, HLH, HLL, LHH, LHL, LLH, and LLL), one for each combination of the type of frequency and dimension applied. High frequency coefficients capture high-frequency signals such as edges and noise, whereas low frequency coefficients give a smoother representation of the signal. The combination of the high and low frequency highlights edges and ridges in specific directions as indicators of textures.
In addition, wavelet decomposition implicitly offers a multiresolution approach by recursively applying the decomposition on the LLL sub-band.
We used simple statistical descriptors (i.e., mean, standard deviation, skewness, and kurtosis) on each sub-band for two subsequent resolution levels [25].
Classification
Among all classification algorithms presented in the literature, we opted for random forests (RFs) [26]. Random forests are an ensemble of different decision trees built on random subsets of the input space. A decision tree is a multivariate classifier, which splits multidimensional data recursively, one variable at the time, to create homogeneous subsets of data. The classification of new samples is performed by assigning the class of the subset the new samples falls into. Assembling multiple decision trees together creates a random forest, which offers higher robustness to noise and higher generalization compared to a single decision tree. We used 2001 trees. To avoid overfitting, our RFs implement decision trees were built on a random subset of the input space [27]. Such RFs have been shown to be efficient classifiers, able to handle complex and non-linear classification problems and large and high-dimensional datasets and provide high accuracy [28]. Its training is performed using a local optimal strategy which recursively minimizes the probability of a random sample to be misclassified, a.k.a. Gini index. A reduction of the Gini index given by the selection of a certain feature, summed over all decision trees in the forest, a.k.a. Gini importance (GI), provides a quantification of the importance of each feature during the classification task [26].
At this point, we built the input space (i.e., feature vector) used for the classification. Specifically, we extracted textural features according to the section Three-dimensional Textures Analysis from each vertebral body (global) and the BMD mean, standard deviation, skewness, and kurtosis from a global and local level (i.e., the 27 regions). Subsequently, feature vectors were concatenated for each vertebra in the thoracolumbar spine. Seventy-nine vertebrae with existing fractures (as well as 2 vertebrae with incorrect segmentations without fracture) were excluded from the analysis to avoid bias. These missing values were replaced by the sample mean.
Finally, since textures could be hampered by noise, but also may be destroyed by smoothing, we computed each feature on four increasing levels of Gaussian smoothing. Specifically, we applied a Gaussian isotropic kernel sigma = 0, 1/3, 2/3, and 1—where 0 is no smoothing—and sized three times the sigma.
Feature selection
Reducing the input space to the most relevant features, a.k.a. feature selection, can improve the results significantly, especially in this case, where the information contained in one vertebra could likely be correlated to adjacent vertebrae causing information redundancy. To identify the most important features, we opted for an exponential search: from the training procedure, we extracted the GI and ranked the features accordingly. Then we re-ran the training using the first m features, where m = 2, 4, 8, … 32,768 (in a 2n fashion). A quadratic function was used to model the change in performance w.r.t. n. The vertex of the parabola was used as optimal cut.
Statistical analysis
A significant level of 0.05 was used in all statistical analysis. Descriptive statistics were given by means and standard deviations (SD), after checking for normal distribution. To compare the global density in patients with fractures (FX) and patients without fractures (noFX), we used a Student’s t test. We used a pairwise Pearson correlation coefficient (r) to investigate the relationship of vBMD against age.
The fracture classification performance was computed on a fourfold cross-validation, repeated 10 times with a random forest of 2001 trees, classifying if the patient was in the FX or noFX group (i.e., binary classification). More specifically, the original dataset (i.e., sample) is randomly partitioned into four equal size subsamples. Of the four subsamples, a single subsample is retained as the validation data for testing the model, and the remaining three subsamples are used as training data. This fourfold cross-validation was repeated 10 times with different randomly chosen subsamples to account for possible differences between subsequent trainings. To assess the diagnostic capability of single features as well as the whole model, receiver operator characteristic (ROC) curve analysis was used. The AUC comparisons were statistically tested using the McNiel method.