1 Introduction

In patients with a malignancy, staging at primary diagnosis and restaging at disease recurrence are of great importance for selecting optimal therapies. Following lung and liver involvement, skeletal involvement is common in malignancy; bone is the third most likely site of distant metastases [1]. Accurate identification of bone metastases is thus essential in patients with advanced cancer. Imaging modalities including conventional radiography, computed tomography (CT), magnetic resonance imaging (MRI), skeletal scintigraphy, and positron emission tomography (PET; usually combined with CT or MRI) can be used to detect bone metastases, with each modality having its specific strengths and weaknesses depending on the underlying principles [2]. PET/CT with 18F-fluorodeoxyglucose (18F-FDG) is commonly performed because many malignant tumors demonstrate high 18F-FDG uptake, and it facilitates whole-body imaging in a single session. However, several benign bone conditions, including trauma or fracture, degeneration or spur, infection, osteonecrosis, bone marrow hyperplasia, benign bone tumors, and bone lesions due to benign systemic diseases, can be 18F-FDG avid and mimic malignancy [3]. Differentiating between metastatic and benign bone lesions, especially in patients with oligo-skeletal lesions on imaging studies, can be challenging. Even experienced physicians may be unable to make a confident diagnosis in this situation.

Tumor heterogeneity denotes that different tumor cells show distinct phenotypic features, including cell morphology; gene expression; metabolism; motility; and angiogenic, proliferative, immunogenic, and metastatic potential [4]. High tumor heterogeneity indicates a high probability of pre-existent clones resistant to therapeutic intervention. It has been hypothesized that metabolic heterogeneity measured through imaging can reflect tumor heterogeneity, and many researchers have attempted various analytic techniques for 18F-FDG PET data to investigate their potential roles in cancer [5]. Two analytic approaches are frequently used: the analysis of the histogram of the voxel values within the volume of interest (VOI), which includes the first-order statistics of standardized uptake value (SUV), and the spatial arrangement of voxel values, which includes many different higher-order texture features [6, 7]. Texture analysis (TA) is a group of computational methods that can extract texture features regarding the relationship between adjacent pixels from a given image [8], with applications in computer vision [9], remote sensing [10], and medical imaging [11]. However, the application of TA is confounded by variations in processing steps and parameters, which has led to controversies [12]. A review of 15 studies highlighted related issues and concluded that the evidence is insufficient to support a relationship between texture features and cancer patient survival [13]. One of the most important issues is that multiple hypothesis testing results in significant inflation of type I errors, which can easily lead to false-positive findings [14]. Whether TA can be clinically useful needs to be evaluated extensively. The best approach for resolving this problem involves simplifying the clinical question, controlling the processing variations, and validating the findings obtained from the initial dataset with an independent dataset.

The current study assessed the utility of heterogeneity measurement for the binary classification of metastatic and benign bone lesions in patients with cervical cancer using a specific PET/CT scanner. PET images collected during different periods were used for training and validation. Our hypothesis is that some heterogeneity measures outperform maximum SUV (SUVmax)—a measure of relative glycolytic activity that is commonly used in clinical settings.

2 Materials and Methods

2.1 Patients

This retrospective study on the use of PET/CT for differentiating metastatic and benign bone lesions was approved by the institutional institutional review board (201900947B0), and the requirement for signed informed consent was waived. The data of imaging studies performed on a specific PET/CT scanner (Discovery ST16; GE Health Systems, Milwaukee, WI, USA) in our institution during 2016–2018 in patients with cervical cancer were retrieved. Patients with a history of other malignancies were excluded. Bone lesion images obtained from the studies in 2016–2017 and 2018 were used as the training and validation datasets, respectively.

2.2 18F-FDG PET/CT Imaging

Patients were instructed to fast for at least 4 h before examination. The scan was initiated approximately 90 min after intravenous injection of 370% ± 10% MBq of 18F-FDG. A diluted CT contrast agent (iothalamate meglumine; Mallinckrodt, Missouri, USA) was administered orally during the tracer uptake period. Patients were scanned in the supine position. After CT acquisition from the head to upper thigh, the PET data were acquired in the three-dimensional (3D) mode, with an acquisition time of 2.5 min per cradle position. The CT data were used for attenuation correction, and the PET images were reconstructed by applying an iterative ordered subset expectation maximization algorithm, with 4 iterations and 10 subsets and a transaxial image matrix size of 128 × 128. The reconstructed voxel size was 5.47 × 5.47 × 3.27 mm3. SUV was defined as the measured tissue concentration (MBq/mL) of the tracer divided by activity injected per body weight (MBq/g) in the reconstructed voxel.

2.3 Identification of Metastatic Bone Lesions

The clinical records of patients were reviewed. A patient was confirmed to have bone metastases given pathological proof or if imaging studies showed progression of metastatic bone lesions following the PET/CT study. Metastatic bone lesions in PET/CT studies were identified by a nuclear medicine physician. For each identified lesion, the location and coordinates of the PET voxels with lesion SUVmax were recorded.

2.4 Selection of Benign Bone Lesions

To match the number of metastatic bone lesions identified, an equal number of benign bone abnormalities were selected from the PET/CT studies of patients without any evidence of bone metastasis in clinical records and follow-up imaging studies. These abnormalities comprised bone abnormalities with relatively increased 18F-FDG activities. For each selected benign abnormality, the location and coordinates of the PET voxel with lesion SUVmax were recorded.

2.5 First-Order Metrics and VOI Settings

For lesion classification, we assessed five first-order metrics: lesion SUVmax, mean (SUVmean), standard deviation (SUVsd), skewness (SUVsk), and kurtosis (SUVku) of voxel SUV in VOI. Because of the small size of bone lesions and the limited resolution of reconstructed PET images, we did not to define the exact border of a bone lesion. Instead, a simple cuboid VOI, centering on the recorded voxel with lesion SUVmax, was defined. The side length of VOI was limited to not exceed 20 mm to avoid the inclusion of too many background voxels. VOI consisting of 3 × 3 × 5 voxels (approximately 16.4 × 16.4 × 16.4 mm3) was thus selected to approach a regular cubic shape.

2.6 Voxel Size, SUV Interpolation, and VOI Definition for TA Metrics

Because stable TA requires many voxels in VOI, the direct application of the reconstructed voxel size from PET imaging data would be inadequate [15, 16]. In this study, smaller cubic voxels with side lengths of 1, 2, and 3 mm were thus adopted with the voxel SUV estimated by trilinear interpolation. In the TA of 2D images, a square-shaped region of interest is usually defined. In the current study with 3D data, an isotropic cubic-shaped VOI was adopted. For each recorded lesion, TA metrics were computed from cubic VOIs centered on the recorded voxel location. Three representative VOI sizes, with side lengths of 10, 15, and 20 mm, were adopted for further TA.

2.7 Quantization of Voxel SUV Values in VOI

To assess various texture features, voxel SUV values in VOI must be quantized into a specified number of bins. In this study, bin numbers were set representatively as 2n (where n = 1–8). A linear quantization method was adopted, setting the specified number of bins linearly between the minimum and maximum voxel SUV values in VOI.

2.8 Texture Features and Metrics

Here, the software tool used for the computation of TA metrics was based on the open-source project Chang-Gung Image Texture Analysis (CGITA) toolbox—developed at our institution and implemented in a MATLAB (MathWorks Inc., Natick, MA, USA) environment [17]. Eight types of parent texture matrixes were included: cooccurrence [8], run length (voxel-alignment) [18], neighborhood difference [19], size zone [20], texture spectrum [21], texture feature coding [22], texture feature coding cooccurrence [22], and neighborhood dependence [23] matrixes. Table 1 lists 62 exploited texture features derived from these parent matrixes. For lesion classification, a total of 4464 TA metrics were evaluated considering the combination of 3 VOI sizes, 3 voxel sizes, 8 bin numbers, and 62 texture features.

Table 1 Parent texture matrices with derived texture features exploited in the study

2.9 Definition of 3D Connectivity and Cooccurrence Matrix Offset

A simple 6-connected neighborhood was adopted for calculating 3D texture features; here, six face-touched voxels were considered the direct neighbors of the central voxel. TA metrics were calculated in these six neighboring directions and were then averaged. Because of different voxel sizes, only the distance offset of one voxel was evaluated during cooccurrence matrix computation.

2.10 Statistical Analysis

The means of first-order metrics for metastatic and benign bone lesions were compared using the independent samples t test. A logistic regression model was used for evaluating lesion classification results, with area under the receiver operating characteristic curve (AUC) as the performance measure. AUC comparisons were made using a fast implementation of DeLong’s algorithm [24]. A two-sided P value of < .05 was considered statistically significant. The correlations between metrics were assessed using the Pearson’s formula; the absolute coefficient values of < .2, .2–.4, .4–.6, .6–.8, and .8–1.0 represented very weak, weak, moderate, strong, and very strong correlations, respectively. The statistics were performed using R (version 4.1.0; R Foundation for Statistical Computing, Vienna, Austria). Because of the large amount of TA metrics, only 20 top-performing TA metrics with highest AUCs from the training dataset were included in the analysis. First-order and top-performing TA metrics were then assessed using the validation dataset.

3 Results

3.1 Patients and 18F-FDG PET/CT Studies

For the training dataset, the data of 187 18F-FDG PET/CT studies from 152 patients with cervical cancer were retrieved. Eight (5.3%) patients were confirmed to have metastatic bone lesions, with a total of 98 metastatic bone lesions identified from 9 studies. An equal number of benign bone lesions were identified from 14 studies including 12 patients without evidence of bone metastasis. These mainly corresponded to the sites of bone marrow hyperplasia due to the paucity of other benign bone lesions.

For the validation dataset, the data of 84 18F-FDG PET/CT studies including 80 patients with cervical cancer were retrieved. Five (6.3%) patients were confirmed to have metastatic bone lesions, with a total of 42 metastatic bone lesions identified from 6 studies. An equal number of benign bone lesions were identified from 12 studies including 12 patients without evidence of bone metastasis.

Representative PET/CT images of metastatic and benign bone lesions are shown in Figs. 1 and 2, respectively, along with 20 × 20-mm2 transaxial slices through the VOI center with four voxel sizes including the reconstructed voxel with a side length of 5.47 mm and interpolated voxels with side lengths of 3, 2, and 1 mm. These figures indicated that the resolution of reconstructed PET voxels was limited.

Fig. 1
figure 1

A metastatic bone lesion with an SUVmax of 13.0 was identified over the left iliac bone on the transaxial fused PET/CT image (a, white arrow) and PET image (b, black arrow) in a 50-year-old woman with poorly differentiated cervical cancer with multiple metastases at primary staging. The 20 × 20-mm2 transaxial slices centered on voxel with SUVmax are displayed using scaled gray levels with four different voxel sizes: reconstructed PET voxel with transaxial side length of 5.47 mm (c) and interpolated voxels with side lengths of 3 (d), 2 (e), and 1 (f) mm. The patient received palliative chemoradiotherapy but follow-up imaging studies showed disease progression. This patient died 11 months after primary diagnosis

Fig. 2
figure 2

A benign postsurgical bone lesion with an SUVmax of 4.8 was identified nearby the screw placed in the lower lumbar spine on the transaxial fused PET/CT image (a, white arrow) and PET image (b, black arrow) at a follow-up PET/CT study in a 60-year-old woman with moderately differentiated cervical squamous cell carcinoma, FIGO stage IB, diagnosed 2 years earlier and treated with curative chemoradiotherapy. The 20 × 20-mm2 transaxial slices centered on voxel with SUVmax are displayed using scaled gray levels with four different voxel sizes: reconstructed PET voxel with transaxial side length of 5.47 mm (c) and interpolated voxel with side lengths of 3 (d), 2 (e), and 1 (f) mm. The patient remained disease-free up to the most recent follow-up

3.2 First-Order Metrics in the Training Dataset

The results of first-order metrics for lesion classification in the training dataset are presented in Table 2. The mean SUVmax, SUVmean and SUVsd were significantly different for metastatic and benign bone lesions, but the mean SUVsk and SUVku were not. SUVsd achieved superior performance, with AUC significantly higher than that of SUVmax (.798 vs .732, P = .001).

Table 2 First-order metrics for lesion classification in the training dataset

3.3 Top-Performing TA Metrics for Lesion Classification

The 20 top-performing TA metrics for lesion classification are listed in Table 3. They led to significantly higher AUC values (up to .89) than that of SUVsd. Notably, all of the metrics were derived from VOI with a side length of 20 mm and the same parent texture matrix (i.e., texture feature coding cooccurrence matrix), and they demonstrated a moderate correlation with SUVsd.

Table 3 Twenty top-performing TA metrics selected for lesion classification from the training dataset

3.4 Performance of First-Order Metrics in the Validation Dataset

Table 4 presents the results of first-order metrics in the validation dataset. For metastatic and benign bone lesions, the mean SUVmax, SUVmean, and SUVsd remained significantly different, but SUVsk and SUVku were not. SUVsd still achieved superior performance, with its AUC being significantly higher than that of SUVmax (.786 vs .684, P < .001).

Table 4 First-order metrics for lesion classification in the validation dataset

3.5 Performance of Selected TA Metrics in the Validation Dataset

Table 5 presents the performance of selected TA metrics in the validation dataset. Although they exhibited significantly higher AUCs in the training dataset, these TA metrics did not show significantly different performance compared with SUVsd in the validation dataset—with only one metric having an AUC higher than that of SUVsd (.798 vs .786, P = .239). As in the training dataset, all these TA metrics demonstrated a significant correlation with SUVsd.

Table 5 Selected TA metrics for lesion classification in the validation dataset

4 Discussion

Our previous study in patients with advanced cervical cancer demonstrated the superiority of 18F-FDG PET to CT and MRI for detecting hematogenous bone metastasis [25]. However, benign bone lesions can mimic malignancy on 18F-FDG imaging. Adams et al. retrospectively included 102 patients who underwent both 18F-FDG PET/CT and CT-guided bone biopsy under the suspicion of malignancy [26]. Histological examination revealed malignancy in 91 (89%) patients. Older age, bone marrow replacement and expansion of the lesion on CT, and presence of multifocal lesions on 18F-FDG PET/CT were significantly more frequent in patients with malignant lesions. Cortical destruction and surrounding soft tissue mass in patients with malignant bone lesions may also aid in interpretation. Therefore, we investigated whether heterogeneity measurement can aid in the classification of metastatic and benign bone lesions.

Lesion SUVmax, which can be easily measured clinically, was found to have utility for the classification of metastatic and benign bone lesions in the current study—with AUCs of .732 and .684 in the training and validation datasets, respectively. This result is compatible with our clinical experience that bone lesions with a higher SUVmax tends to be malignant if no obvious evidence of benign bone changes on CT can be observed. A simple first-order measure of heterogeneity, SUVsd, was found to be significantly superior to SUVmax because it exhibited significantly higher AUCs in both the training and validation datasets. Although some TA metrics could achieve significantly higher AUC values than did SUVsd in the training dataset, they failed to retain this advantage in the validation dataset. This phenomenon probably reflects the effect of multiple hypothesis testing in the TA process because of the presence of multiple features and parameters, considering that all the top-performing TA metrics showed significant and similar correlations with SUVsd in the training and validation datasets. Without an independent validation dataset, these TA metrics may be mistaken to be better classifiers of bone lesions.

Other issues that should be considered in texture-related studies include image resolution, choice of quantization method, and the bin number in the quantized images—all of which significantly influence most texture features [27]. In the current study, PET images were collected using a specific PET/CT equipment through a fixed reconstruction method. A simple linear quantization method was adopted with eight representative bin numbers. These confounding factors were thus relatively controlled for in the current study. However, multiple false-positive findings were noted, probably because many TA metrics were applied.

The definition of VOI is also an important issue. Although various methods have been used for tumor segmentation, the effects of different tumor sizes and shapes on the TA metrics should be elucidated before the metrics can be universally applied in the segmented tumor volumes. In the current study, cubic VOI was adopted to avoid these uncertain effects as well as streamline the process of defining the bone lesion volume. This approach also enabled operator-independent, reproducible analyses, although some background information needs to be included in VOI.

Xu et al. collected 59 malignant bone or soft tissue tumors (including 18 metastatic bone tumors) and 44 benign bone or soft tissue lesions > 25 mm in diameter [28]. For differential diagnosis of malignant and benign lesions, the authors suggested that compared with SUV parameters, combination of PET and CT texture parameters may exhibit improved performance. The classification power of TA metrics derived from CT images was not evaluated in the current study and thus warrants further exploration.

The current study has several limitations. First, the resolution of PET images retrieved from the specific PET/CT scanner was not as high as that of images acquired from newer-generation PET/CT systems. In addition, images with lower resolution may demonstrate lower heterogeneity. Although we used interpolation to reduce the voxel size, the detailed texture information would not increase. Further research using newer high-resolution PET/CT scanners is warranted. Second, the number of benign bone lesions with high SUVmax was quite limited in this study. Identifying malignant and benign bone lesions with similar SUVmax is desirable for eliminating the confounding effect of SUVmax. Benign bone lesions with high SUVmax from other patients may be considered in this type of study, as the texture characteristics of benign bone lesions are possibly independent of patient cancer status.

5 Conclusion

Here, we identified a simple first-order measure of heterogeneity, SUVsd, to be superior to SUVmax for the classification of metastatic and benign bone lesions. Multiple hypothesis testing can lead to false-positive findings in TA with multiple features and parameters; thus, careful validation is needed.