Measuring Heterogeneity in 18F-Fluorodeoxyglucose Positron Emission Tomography Images for Classifying Metastatic and Benign Bone Lesions in Patients with Cervical Cancer

Heterogeneity assessment can be applied for medical imaging analysis. Here, we evaluated first-order and texture analysis (TA) metrics in 18F-fluorodeoxyglucose (18F-FDG) positron emission tomography (PET) imaging for classification of metastatic and benign bone lesions in patients with cervical cancer. The data of 18F-FDG PET studies performed on a specific PET/CT system from 2016 to 2018 in patients with cervical cancer were retrieved. The data of bone lesions extracted from studies over 2016–2017 and 2018 were used as training and validation datasets, respectively. Metastatic bone lesions were identified in each dataset, with an equal number of benign bone lesions selected. Cuboid volume of interest (VOI) consisting of 3 × 3 × 5 reconstructed voxels was applied for first-order metrics, and cubic VOI consisting of smaller voxels with trilinear interpolation of standardized uptake value (SUV) was adopted for TA metrics. First-order metrics included the maximum SUV (SUVmax) of lesions and the mean voxel SUV and its standard deviation (SUVsd), skewness, and kurtosis in VOI. In total, 4464 TA metrics based on 62 texture features were evaluated. Logistic regression was used for classification with area under the receiver operating characteristic curve (AUC) as the performance measure. From the training and validation datasets, 98 and 42 metastatic bone lesions were identified, respectively. SUVsd demonstrated higher performance than did SUVmax in both the training (AUC .798 vs .732, P = .001) and validation (AUC .786 vs .684, P < .001) datasets. Top-performing TA metrics demonstrated significantly higher performance in the training dataset, but not in the validation dataset. A simple first-order measure of heterogeneity, SUVsd, was found to be superior to SUVmax for the classification of metastatic and benign bone lesions. Multiple hypothesis testing can result in false-positive findings in TA with multiple features and parameters; careful validation is required.


Introduction
In patients with a malignancy, staging at primary diagnosis and restaging at disease recurrence are of great importance for selecting optimal therapies. Following lung and liver involvement, skeletal involvement is common in malignancy; bone is the third most likely site of distant metastases [1]. Accurate identification of bone metastases is thus essential in patients with advanced cancer. Imaging modalities including conventional radiography, computed tomography (CT), magnetic resonance imaging (MRI), skeletal scintigraphy, and positron emission tomography (PET; usually combined with CT or MRI) can be used to detect bone metastases, with each modality having its specific strengths and weaknesses depending on the underlying principles [2]. PET/CT with 18 F-fluorodeoxyglucose ( 18 F-FDG) is commonly performed because many malignant tumors demonstrate high 18 F-FDG uptake, and it facilitates whole-body imaging in a single session. However, several benign bone conditions, including trauma or fracture, degeneration or spur, infection, osteonecrosis, bone marrow hyperplasia, benign bone tumors, and bone lesions due to benign systemic diseases, can be 18 F-FDG avid and mimic malignancy [3]. Differentiating between metastatic and benign bone lesions, especially in patients with oligo-skeletal lesions on imaging studies, can be challenging. Even experienced physicians may be unable to make a confident diagnosis in this situation.
Tumor heterogeneity denotes that different tumor cells show distinct phenotypic features, including cell morphology; gene expression; metabolism; motility; and angiogenic, proliferative, immunogenic, and metastatic potential [4]. High tumor heterogeneity indicates a high probability of pre-existent clones resistant to therapeutic intervention. It has been hypothesized that metabolic heterogeneity measured through imaging can reflect tumor heterogeneity, and many researchers have attempted various analytic techniques for 18 F-FDG PET data to investigate their potential roles in cancer [5]. Two analytic approaches are frequently used: the analysis of the histogram of the voxel values within the volume of interest (VOI), which includes the first-order statistics of standardized uptake value (SUV), and the spatial arrangement of voxel values, which includes many different higher-order texture features [6,7]. Texture analysis (TA) is a group of computational methods that can extract texture features regarding the relationship between adjacent pixels from a given image [8], with applications in computer vision [9], remote sensing [10], and medical imaging [11]. However, the application of TA is confounded by variations in processing steps and parameters, which has led to controversies [12]. A review of 15 studies highlighted related issues and concluded that the evidence is insufficient to support a relationship between texture features and cancer patient survival [13]. One of the most important issues is that multiple hypothesis testing results in significant inflation of type I errors, which can easily lead to false-positive findings [14]. Whether TA can be clinically useful needs to be evaluated extensively. The best approach for resolving this problem involves simplifying the clinical question, controlling the processing variations, and validating the findings obtained from the initial dataset with an independent dataset.
The current study assessed the utility of heterogeneity measurement for the binary classification of metastatic and benign bone lesions in patients with cervical cancer using a specific PET/CT scanner. PET images collected during different periods were used for training and validation. Our hypothesis is that some heterogeneity measures outperform maximum SUV (SUVmax)-a measure of relative glycolytic activity that is commonly used in clinical settings.

Patients
This retrospective study on the use of PET/CT for differentiating metastatic and benign bone lesions was approved by the institutional institutional review board (201900947B0), and the requirement for signed informed consent was waived. The data of imaging studies performed on a specific PET/CT scanner (Discovery ST16; GE Health Systems, Milwaukee, WI, USA) in our institution during 2016-2018 in patients with cervical cancer were retrieved. Patients with a history of other malignancies were excluded. Bone lesion images obtained from the studies in 2016-2017 and 2018 were used as the training and validation datasets, respectively.

18 F-FDG PET/CT Imaging
Patients were instructed to fast for at least 4 h before examination. The scan was initiated approximately 90 min after intravenous injection of 370% ± 10% MBq of 18 F-FDG. A diluted CT contrast agent (iothalamate meglumine; Mallinckrodt, Missouri, USA) was administered orally during the tracer uptake period. Patients were scanned in the supine position. After CT acquisition from the head to upper thigh, the PET data were acquired in the three-dimensional (3D) mode, with an acquisition time of 2.5 min per cradle position. The CT data were used for attenuation correction, and the PET images were reconstructed by applying an iterative ordered subset expectation maximization algorithm, with 4 iterations and 10 subsets and a transaxial image matrix size of 128 × 128. The reconstructed voxel size was 5.47 × 5.47 × 3.27 mm 3 . SUV was defined as the measured tissue concentration (MBq/mL) of the tracer divided by activity injected per body weight (MBq/g) in the reconstructed voxel.

Identification of Metastatic Bone Lesions
The clinical records of patients were reviewed. A patient was confirmed to have bone metastases given pathological proof or if imaging studies showed progression of metastatic bone lesions following the PET/CT study. Metastatic bone lesions in PET/CT studies were identified by a nuclear medicine physician. For each identified lesion, the location and coordinates of the PET voxels with lesion SUVmax were recorded.

Selection of Benign Bone Lesions
To match the number of metastatic bone lesions identified, an equal number of benign bone abnormalities were selected from the PET/CT studies of patients without any evidence of bone metastasis in clinical records and followup imaging studies. These abnormalities comprised bone abnormalities with relatively increased 18 F-FDG activities. For each selected benign abnormality, the location and coordinates of the PET voxel with lesion SUVmax were recorded.

First-Order Metrics and VOI Settings
For lesion classification, we assessed five first-order metrics: lesion SUVmax, mean (SUVmean), standard deviation (SUVsd), skewness (SUVsk), and kurtosis (SUVku) of voxel SUV in VOI. Because of the small size of bone lesions and the limited resolution of reconstructed PET images, we did not to define the exact border of a bone lesion. Instead, a simple cuboid VOI, centering on the recorded voxel with lesion SUVmax, was defined. The side length of VOI was limited to not exceed 20 mm to avoid the inclusion of too many background voxels. VOI consisting of 3 × 3 × 5 voxels (approximately 16.4 × 16.4 × 16.4 mm 3 ) was thus selected to approach a regular cubic shape.

Voxel Size, SUV Interpolation, and VOI Definition for TA Metrics
Because stable TA requires many voxels in VOI, the direct application of the reconstructed voxel size from PET imaging data would be inadequate [15,16]. In this study, smaller cubic voxels with side lengths of 1, 2, and 3 mm were thus adopted with the voxel SUV estimated by trilinear interpolation. In the TA of 2D images, a square-shaped region of interest is usually defined. In the current study with 3D data, an isotropic cubic-shaped VOI was adopted. For each recorded lesion, TA metrics were computed from cubic VOIs centered on the recorded voxel location. Three representative VOI sizes, with side lengths of 10, 15, and 20 mm, were adopted for further TA.

Quantization of Voxel SUV Values in VOI
To assess various texture features, voxel SUV values in VOI must be quantized into a specified number of bins. In this study, bin numbers were set representatively as 2 n (where n = 1-8). A linear quantization method was adopted, setting the specified number of bins linearly between the minimum and maximum voxel SUV values in VOI.

Texture Features and Metrics
Here, the software tool used for the computation of TA metrics was based on the open-source project Chang-Gung Image Texture Analysis (CGITA) toolbox-developed at our institution and implemented in a MATLAB (MathWorks Inc., Natick, MA, USA) environment [17]. Eight types of parent texture matrixes were included: cooccurrence [8], run length (voxel-alignment) [18], neighborhood difference [19], size zone [20], texture spectrum [21], texture feature coding [22], texture feature coding cooccurrence [22], and neighborhood dependence [23] matrixes. Table 1 lists 62 exploited texture features derived from these parent matrixes. For lesion classification, a total of 4464 TA metrics were evaluated considering the combination of 3 VOI sizes, 3 voxel sizes, 8 bin numbers, and 62 texture features. Run percentage, short-run emphasis, long-run emphasis, low-intensity run emphasis, high-intensity run emphasis, gray-level nonuniformity, run length nonuniformity, low-intensity short-run emphasis, highintensity short-run emphasis, low-intensity long-run emphasis, high-intensity long-run emphasis Neighborhood difference Coarseness, contrast, busyness, complexity, strength Size-zone Run percentage, short-run emphasis, long-run emphasis, low-intensity run emphasis, high-intensity run emphasis, gray-level nonuniformity, run length nonuniformity, low-intensity short-run emphasis, highintensity short-run emphasis, low-intensity long-run emphasis, high-intensity long-run emphasis Texture spectrum Maximum, variance Texture feature coding Coarseness, homogeneity, mean convergence, variance Texture feature coding cooccurrence Energy, entropy, variance, correlation, contrast, homogeneity, sum mean, dissimilarity, inverse difference moment (IDM) Neighborhood dependence Run percentage, short-run emphasis, long-run emphasis, low-intensity run emphasis, high-intensity run emphasis, gray-level nonuniformity, run length nonuniformity, low-intensity short-run emphasis, highintensity short-run emphasis, low-intensity long-run emphasis, high-intensity long-run emphasis

Definition of 3D Connectivity and Cooccurrence Matrix Offset
A simple 6-connected neighborhood was adopted for calculating 3D texture features; here, six face-touched voxels were considered the direct neighbors of the central voxel. TA metrics were calculated in these six neighboring directions and were then averaged. Because of different voxel sizes, only the distance offset of one voxel was evaluated during cooccurrence matrix computation.

Statistical Analysis
The means of first-order metrics for metastatic and benign bone lesions were compared using the independent samples t test. A logistic regression model was used for evaluating lesion classification results, with area under the receiver operating characteristic curve (AUC) as the performance measure. AUC comparisons were made using a fast implementation of DeLong's algorithm [24]. A two-sided P value of < .05 was considered statistically significant.

Patients and 18 F-FDG PET/CT Studies
For the training dataset, the data of 187 18 F-FDG PET/ CT studies from 152 patients with cervical cancer were retrieved. Eight (5.3%) patients were confirmed to have metastatic bone lesions, with a total of 98 metastatic bone lesions identified from 9 studies. An equal number of benign bone lesions were identified from 14 studies including 12 patients without evidence of bone metastasis. These mainly corresponded to the sites of bone marrow hyperplasia due to the paucity of other benign bone lesions. For the validation dataset, the data of 84 18 F-FDG PET/ CT studies including 80 patients with cervical cancer were retrieved. Five (6.3%) patients were confirmed to have metastatic bone lesions, with a total of 42 metastatic bone lesions identified from 6 studies. An equal number of benign bone lesions were identified from 12 studies including 12 patients without evidence of bone metastasis.
Representative PET/CT images of metastatic and benign bone lesions are shown in Figs. 1 and 2, respectively, along with 20 × 20-mm 2 transaxial slices through the VOI center with four voxel sizes including the reconstructed voxel with a side length of 5.47 mm and interpolated voxels with side lengths of 3, 2, and 1 mm. These figures indicated that the resolution of reconstructed PET voxels was limited.

First-Order Metrics in the Training Dataset
The results of first-order metrics for lesion classification in the training dataset are presented in Table 2. The mean SUVmax, SUVmean and SUVsd were significantly different for metastatic and benign bone lesions, but the mean SUVsk and SUVku were not. SUVsd achieved superior performance, with AUC significantly higher than that of SUVmax (.798 vs .732, P = .001).

Top-Performing TA Metrics for Lesion Classification
The 20 top-performing TA metrics for lesion classification are listed in Table 3. They led to significantly higher AUC values (up to .89) than that of SUVsd. Notably, all of the metrics were derived from VOI with a side length of 20 mm and the same parent texture matrix (i.e., texture feature coding cooccurrence matrix), and they demonstrated a moderate correlation with SUVsd. Table 4 presents the results of first-order metrics in the validation dataset. For metastatic and benign bone lesions, the mean SUVmax, SUVmean, and SUVsd remained significantly different, but SUVsk and SUVku were not. SUVsd still achieved superior performance, with its AUC being significantly higher than that of SUVmax (.786 vs .684, P < .001). Table 5 presents the performance of selected TA metrics in the validation dataset. Although they exhibited significantly higher AUCs in the training dataset, these TA metrics did not show significantly different performance compared with SUVsd in the validation dataset-with only one metric having an AUC higher than that of SUVsd (.798 vs .786, P = .239). As in the training dataset, all these TA metrics demonstrated a significant correlation with SUVsd.

Discussion
Our previous study in patients with advanced cervical cancer demonstrated the superiority of 18 F-FDG PET to CT and MRI for detecting hematogenous bone metastasis [25]. However, benign bone lesions can mimic malignancy on 18 F-FDG imaging. Adams et al. retrospectively included 102 patients who underwent both 18 F-FDG PET/CT and CTguided bone biopsy under the suspicion of malignancy [26]. Histological examination revealed malignancy in 91 (89%) patients. Older age, bone marrow replacement and expansion of the lesion on CT, and presence of multifocal lesions on 18 F-FDG PET/CT were significantly more frequent in patients with malignant lesions. Cortical destruction and surrounding soft tissue mass in patients with malignant bone lesions may also aid in interpretation. Therefore, we investigated whether heterogeneity measurement can aid in the classification of metastatic and benign bone lesions. Lesion SUVmax, which can be easily measured clinically, was found to have utility for the classification of metastatic and benign bone lesions in the current study-with AUCs of .732 and .684 in the training and validation datasets, respectively. This result is compatible with our clinical experience that bone lesions with a higher SUVmax tends to be malignant if no obvious evidence of benign bone changes on CT can be observed. A simple first-order measure of   heterogeneity, SUVsd, was found to be significantly superior to SUVmax because it exhibited significantly higher AUCs in both the training and validation datasets. Although some TA metrics could achieve significantly higher AUC values than did SUVsd in the training dataset, they failed to retain this advantage in the validation dataset. This phenomenon probably reflects the effect of multiple hypothesis testing in the TA process because of the presence of multiple features and parameters, considering that all the top-performing TA metrics showed significant and similar correlations with SUVsd in the training and validation datasets. Without an independent validation dataset, these TA metrics may be mistaken to be better classifiers of bone lesions. Other issues that should be considered in texture-related studies include image resolution, choice of quantization method, and the bin number in the quantized images-all of which significantly influence most texture features [27]. In the current study, PET images were collected using a specific PET/CT equipment through a fixed reconstruction method. A simple linear quantization method was adopted with eight representative bin numbers. These confounding factors were thus relatively controlled for in the current study. However, multiple false-positive findings were noted, probably because many TA metrics were applied.
The definition of VOI is also an important issue. Although various methods have been used for tumor segmentation, the effects of different tumor sizes and shapes on the TA metrics should be elucidated before the metrics can be universally applied in the segmented tumor volumes. In the current study, cubic VOI was adopted to avoid these uncertain effects as well as streamline the process of defining the bone lesion volume. This approach also enabled operator-independent, reproducible analyses, although some background information needs to be included in VOI. Xu et al. collected 59 malignant bone or soft tissue tumors (including 18 metastatic bone tumors) and 44 benign bone or soft tissue lesions > 25 mm in diameter [28]. For differential diagnosis of malignant and benign lesions, the authors suggested that compared with SUV parameters, combination of PET and CT texture parameters may exhibit improved performance. The classification power of TA metrics derived from CT images was not evaluated in the current study and thus warrants further exploration.
The current study has several limitations. First, the resolution of PET images retrieved from the specific PET/CT scanner was not as high as that of images acquired from newer-generation PET/CT systems. In addition, images with lower resolution may demonstrate lower heterogeneity. Although we used interpolation to reduce the voxel size, the detailed texture information would not increase. Further research using newer high-resolution PET/CT scanners is warranted. Second, the number of benign bone lesions with high SUVmax was quite limited in this study. Identifying malignant and benign bone lesions with similar SUVmax is desirable for eliminating the confounding effect of SUVmax. Benign bone lesions with high SUVmax from other patients may be considered in this type of study, as the texture characteristics of benign bone lesions are possibly independent of patient cancer status.

Conclusion
Here, we identified a simple first-order measure of heterogeneity, SUVsd, to be superior to SUVmax for the classification of metastatic and benign bone lesions. Multiple hypothesis testing can lead to false-positive findings in TA with multiple features and parameters; thus, careful validation is needed.

Conflict of interest
The authors have no conflict of interest to declare.

Ethical Approval
The study was approved by the Institutional Review Board of Chang Gung Memorial Hospital (201900947B0), with a waiver for the requirement of signed informed consent. All procedures involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.