1 Introduction

Pleomorphic adenomas (PA) are the most common type of benign tumors that occur in the parotid gland, accounting for 50–60% of all parotid tumors [1,2,3]. These slow-growing tumors are generally considered low-risk. Despite their benign nature, however, PAs have a relatively high risk of recurrence and malignant transformation [4]. Previous studies reported that capsular characteristics and surgical factors are the most likely causes for recurrence [5, 6].

Capsular characteristics refer to the appearance of the outer layer or capsule surrounding a parotid pleomorphic adenoma. A well-defined capsule is desirable as it facilitates complete tumor removal during surgery, reducing the risk of recurrence. However, capsules may vary in appearance. For example, some capsules may be thin and delicate. In some cases, the capsule may also be irregular or indistinct, making it challenging to determine the true extent of the tumor and increasing the risk of incomplete removal [7,8,9]. Pseudopodia and satellite nodules can permeate through the surrounding parotid tissue and thus are likely to be left when the surgical margin is not enough [10]. Capsular characteristics have led to an evolution of surgical methods for parotid PA [11, 12]. The goal is to remove the tumor completely while preserving the surrounding normal tissue and minimizing the risk of recurrence.

Currently, there is no clinically applicable, non-invasive method for preoperative evaluation of capsular characteristics. Fine-needle aspiration cytology cannot evaluate the capsular characteristics. Besides, visual inspection of computed tomography (CT) images, even by practiced radiologists to evaluate the capsular characteristics of PA, showed poor consistency with pathology [13]. Radiomics, a medical imaging field that extracts and analyzes quantitative data from images [14]. Intratumoral radiomics is a promising tool for identifying different pathological types of parotid tumors, with the number of patients included in previous studies ranging from 101 to 127 [15,16,17]. Intratumoral radiomics focuses on characterizing the heterogeneity and complexity of the tumor itself and can provide information on the molecular and biological characteristics of the tumor. Peritumoral radiomics, on the other hand, focuses on the changes in the surrounding tissue and can be used to better understand the interactions between the tumor and its surrounding tissue [18].

In this study, we aimed to investigate the use of radiomics for preoperative evaluation of capsular characteristics to guide surgical plans and patient management. We hypothesized that peritumoral radiomics features, which reflect the changes of tissues around tumor, might provide valuable information for the possible infiltration of tumor towards normal tissue.

2 Material and methods

2.1 Patient cohort

We searched the medical data of patients with PA at two institutions between January 2012 and February 2022. To gather the data, we utilized an electronic search of the Picture Archiving and Communication System (PACS) and Hospital Information System (HIS). Our institutional review board approved the study and waived the requirement for obtaining informed consent from patients. De-identifying images protected the privacy of the patients before analysis.

The study included patients who met the following criteria: (1) they had a confirmed diagnosis of PA through surgical pathology; (2) they underwent a contrast-enhanced CT scan before receiving any treatment. The study excluded patients who: (1) were diagnosed with carcinoma ex pleomorphic adenoma; (2) had a history of recurrent PA or previous parotid gland surgery; (3) had images that were compromised by significant metallic artifacts. The selection process for the patients is depicted in the flow diagram in Additional file 1: Figure S1. In the end, a total of 260 patients were included in the study, with 166 patients from our institution serving as the training cohort (92 with a complete capsule and 74 without a complete capsule), and 94 patients from another institution serving as the test cohort (49 with a complete capsule and 45 without a complete capsule).

2.2 The reference standard for capsular characteristics

In this study, we used the following nomenclature, which is consistent with previous research [8]:

  1. 1.

    “Incomplete capsule” refers to a partial absence of encapsulation.

  2. 2.

    “Capsule penetration” refers to tumor tissue that has infiltrated into the tumor capsule but is not separated from the main mass by fibrous fibers.

  3. 3.

    “Pseudopodia” refers to tumor nodules that are separated from the main tumor mass by fibrous tissue but remain confined to or in contact with the main tumor capsule.

  4. 4.

    “Satellite nodules” refers to nodules located adjacent to the main tumor but not connected to it, and are separated from the main tumor by salivary glands or fat tissue.

Patients with incomplete capsule, capsule penetration, pseudopodia, and satellite nodules were grouped together, and patients with a complete capsule were grouped separately.

All patients in this study had undergone complete resection. The diagnosis of PA with a complete capsule was based on both surgical and pathological reports written by surgeons and pathologists. They were assigned to this group when the surgeon confirmed that the lesion was well-defined, and the pathologist confirmed the capsular completeness based on the histological certainty of the entire circumference of the lesion. The diagnosis of PA without a complete capsule was made based on pathological reports, which included the assessment of its completeness as well as other capsular characteristics such as capsule penetration, pseudopodia, and satellite nodules.

2.3 CT image acquisition

All patients underwent CT scans within a week prior to their surgery. The CT images were saved in the Digital Imaging and Communications in Medicine (DICOM) format. The CT details can be found in the Additional file 1.

2.4 Tumor segmentation and definition of peritumor size

We used ITK-SNAP software (http://www.itksnap.org) to analyze preoperative contrast-enhanced CT images. The software includes semi-automatic delineation algorithms to aid in the process. Nevertheless, the final volume of interest (VOI) was confirmed by experienced radiologists to ensure accuracy by reviewing all the CT image slices. The VOI was determined by encompassing the entire tumor while excluding adjacent bone regions and blood vessels. Two radiologists, with 3 and 5 years of clinical experience, performed the segmentation of the VOIs without knowledge of the patient’s clinical and pathological details. To evaluate the reliability of radiomics feature extraction, we computed both intra- and inter-observer correlation coefficients. A month later, the same radiologist repeated the segmentation process. We only considered features with intra- and inter-observer correlation coefficients of at least 0.80 for further analysis.

To generate the VOI intra-plus peritumor, we followed these steps:

We first identified the tumor’s VOI, which included the entire tumor while excluding adjacent bone regions and blood vessels. Next, we defined a 2 mm peritumoral region around the tumor based on its histological characteristics. This region captures the capsular characteristics typically found at the tumor’s margin and the satellite nodules predominantly located within 2 mm of the central mass [19]. Using the RIAS software package (version 0.1.2), we applied a morphological dilation operator to expand the intratumoral region by 2 mm in radial distance according to pixel size. This resulted in a new VOI that encompassed both the tumor and the peritumoral region, which we named “VOIintra-plus peritumor”. Finally, we subtracted the VOItumor from the VOIintra-plus peritumor to obtain a circular VOI (VOIperitumor) using the same RIAS software package. Figure 1 depicts the process of dilation and subtraction.

Fig. 1
figure 1

The process of obtaining three different regions

In this axial view of an enhanced CT scan of a 63-year-old woman with an incomplete encapsulated pleomorphic adenoma of the parotid gland. The intratumoral mask is first obtained. The peritumoral region is then obtained by dilating the intratumoral mask. Finally, the ring of parotid parenchyma around the tumor is obtained by subtracting the original intratumoral mask from the dilated mask.

2.5 Feature extraction

Before feature extraction, all CT images were resampled to a voxel size of 1 × 1 × 1 mm3 and normalized using the Z-score normalization method implemented in the RIAS software package (version 0.1.2). We utilized the PyRadiomics library (version 2.0.0) to extract features from segmented VOIs [20]. For each region, 18 first-order features, 22 Gy-level co-occurrence matrix (GLCM) features, 16 Gy level run length matrix (GLRLM) features, 16 Gy level size zone matrix (GLSZM) features, 14 Gy level dependence matrix (GLDM) features, 5 neighboring gray-tone difference matrix (NGTDM) features were obtained. Laplacian of Gaussian (LoG) convolution kernel filter (σ range from 1 to 5 mm, in incremental steps of 1 mm) to minimize noise and enhance features at different spatial scales [21]. To ensure that each region has the same number of features, tumor shape features are added to each region. Each region has 1288 radiomics features: 14 shape features extracted from tumor + 91 textural features × (1 original image + 5 LoG filtered images + 8 wavelet filtered images). A detailed list of extracted features is provided in Additional file 1: Table S1 and the parameters used in CT image pre-processing and feature extraction is provided in Additional file 1: Table S2.

2.6 Radiomics features pre-processing and selection

The dataset was normalized through Z-score normalization, which involved subtracting the mean value and dividing by the standard deviation for each feature. We reduced the dimensionality of the feature space by comparing the similarity of each feature pair and removing one of the features if the Pearson correlation coefficient (PCC) value was higher than 0.99. We applied the Kruskal Wallis (KW) test to identify significant features related to the labels and calculated the F-value to assess the relationship between the features and the label. The number of features ranged from 1 to 20.

2.7 Model building and validation

We built and validated models using nine machine learning algorithms, including Ada-boost, Auto-encoder, Decision tree, Gaussian process, Linear discriminant analysis, Logistic Regression, Logistic regression via Lasso, Naive Bayes, and Support Vector Machine, which were implemented using Python code and the scikit-learn library. The parameters of the algorithms are list in Additional file 1: Table S3. For each VOI, 180 models were built by combining each machine learning algorithm with 20 different feature sets. The hyperparameters of each model were determined using tenfold cross-validation on the training dataset. To address the imbalance of the training dataset, we employed two methods: the oversampling and Synthetic Minority Over-sampling Technique (SMOTE) [22]. Both methods were used to balance the dataset, ensuring a more accurate representation of the samples during the analysis process. We evaluated the performance of each model using accuracy and receiver operating characteristic (ROC) curves, calculating metrics such as sensitivity, specificity, accuracy, and area under the curve (AUC). The best model was selected by comparing accuracy metrics, and we estimated the 95% confidence interval (CI) using bootstrap with 1000 samples. We conducted feature preprocessing and model exploration with FeAture Explorer Pro (FAE, version 0.5.2) on Python (3.7.6). The overall workflow is summarized in Fig. 2.

Fig. 2
figure 2

An overview of the current study. A All patients were divided into two group based on capsular characteristics. B The workflow of this study

2.8 Statistical analysis

The statistical analysis was performed using R software (version 3.6). The Mann–Whitney U-test (clinical duration and maximum diameter) and Student’s t-test (age, it was expressed as mean ± standard deviation) compare continuous variables. Pearson Chi-square and Fisher exact test compare categorical variables. P-value < 0.05 was statistically significant.

3 Results

3.1 Patient demographics and clinical parameters

The demographic characteristics showed no significant differences between patients with and without a complete capsule (as shown in Table 1). Although in institution 1, the PA without a complete capsule had a larger diameter (2.38 cm ± 0.70 cm vs. 1.79 cm ± 0.54 cm, P < 0.001), this difference was not significant in institution 2.

Table 1 The demographics and clinical parameters of patients with parotid pleomorphic adenoma

3.2 Comparison of different data balancing techniques

We used the SMOTE and oversampling to make samples balance. They had the similar performance in the training and validation sets. However, the AUC value in the test set was lower when we applied SMOTE. The selection process for the patients is depicted in the flow diagram in Additional file 1: Figure S2.

3.3 Performance of different regions

We compared the AUCs of all the pipelines on the validation dataset. The model using features from the VOIintra-plus peritumor yielded the highest AUC.

Among the nine different classifiers, Linear discriminant analysis (LDA) had the best performance (Table 2). LDA was a linear classifier by fitting class conditional densities to the data and using Bayes’ rule. We found that the model based on 15 features can get the highest AUC on the validation data set. The AUC and the accuracy could achieve 0.860 and 0.819, respectively. In this point, The AUC and the accuracy of the model achieve 0.869 and 0.787 on testing data set (as shown in Fig. 3).

Table 2 Performance of all algorithm classifications using features from VOIintra-plus peritumor
Fig. 3
figure 3

Model performance generated using features from VOIintra-plus peritumor. A ROC of the top performing model in different datasets. B the model based on 15 features can get the highest AUC on the validation data set. C The selected features in the model. AUC area under the curve, VOI region of interest, ROC Receiver operating characteristic curves

The model using features from the VOIperitumor also performed well (when using the Logistic Regression classifier). Out of the nine different classifiers, Logistic Regression (LR) performed the best (as shown in Table 3). LR is a linear classifier that combines all the features. Our analysis revealed that the model based on 11 features achieved the highest AUC on the validation dataset. The AUC and accuracy reached 0.853 and 0.819, respectively. In this point, The AUC and the accuracy of the model achieve 0.790 and 0.723 on testing data set (as shown in Fig. 4).

Table 3 The metrics of all algorithm classifications in VOIperitumor
Fig. 4
figure 4

Model performance generated using features from VOIperitumor. A ROC curves of the best-performing model in different datasets. B the model based on 11 features can get the highest AUC on the validation data set. C The selected features in the model. AUC area under the curve, VOI region of interest, ROC Receiver operating characteristic curves

Regarding the model using VOItumor features, the pipeline using the Logistic Regression via Lasso (LRLasso) classifier achieved the highest AUC (as shown in Table 4). LRLasso constraint is a linear classifier based on Logistic Regression. The L1 norm is added to the final loss function, and the weights are constrained, resulting in sparse features. Our analysis found that the model based on three features achieved the highest AUC on the validation dataset. The AUC and the accuracy could achieve 0.690 and 0.693, respectively. In this point, The AUC and the accuracy of the model achieve 0.755 and 0.723 on testing data set (as depicted in Fig. 5).

Table 4 Performance of all algorithm classifications using features from VOItumor
Fig. 5
figure 5

Model performance generated using features from VOItumor. A ROC curves of the top performing model in different datasets. B the model based on three features can get the highest AUC on the validation data set. C The selected features in the model. AUC area under the curve, VOI region of interest, ROC Receiver operating characteristic curves

3.4 Comparison of AUCs

The Delong test showed that the AUCs were significantly different between VOIintra-plus peritumor and VOItumor (AUC: 0.86 vs. 0.69; difference between areas = 0.17, P = 0.0002), VOIperitumor and VOItumor (AUC: 0.853 vs. 0.69; difference between areas = 0.16, P = 0.0007) in the validation set. In the test set, the AUCs were significantly different between VOIintra-plus peritumor and VOItumor (AUC: 0.869 vs. 0.755; difference between areas = 0.11, P = 0.018), as depicted in Fig. 6.

Fig. 6
figure 6

Comparison of the performance of Models from VOItumor, VOIperitumor, and VOIintra-plus peritumor. A ROC curves of the top performing models in the tenfold cross-validation. B AUC values of the nine different machine learning algorithm-based models in the tenfold cross-validation. C ROC curves of the top performing models in the test set. AUC area under the curve, VOI region of interest, ROC Receiver operating characteristic curves, LDA Linear discriminant analysis, SVM Support vector machine, LRlasson Logistic Regression via Lasso, LR Logistic Regression, AE Auto-encoder, GP Gaussian process, NB Naive Bayes, AB Ada-boost, DT Decision tree

4 Discussion

In this study, we aimed to develop and validate radiomics models based on CT images to differentiate between PA with and without complete capsule. Features were extracted from three VOIs: VOItumor, VOIintra-plus peritumor, and VOIperitumor. The latter two VOIs contained the peritumoral region. Our results indicate that the use of intra-plus peritumoral radiomics features may provide a non-invasive method for determining the complete capsule status, which is important for surgical planning and treatment decisions.

The peritumoral region is the area surrounding the primary tumor and can provide important information about the tumor’s microenvironment, including its interactions with surrounding tissues and its potential for malignant behavior. The value of peritumoral radiomics features has been demonstrated in previous studies, showcasing their potential in characterizing tumor behavior, predicting treatment response, and assessing the risk of recurrence and metastasis [23,24,25,26,27]. However, their application in parotid adenomas (PA) has not yet been explored. Various studies have explored different definitions of the peritumoral region, including distances ranging from 2 to 12 mm for gastrointestinal stromal tumor, 3 mm to 15 mm for breast cancer, and 5 mm to 30 mm for lung nodules. The results of these studies suggest that the peritumoral region closest to the lesion typically yields the best performance in terms of predictive accuracy [23, 25, 27]. Besides, a specific fixed distance (3 mm) was also defined as the peritumoral region for lung cancer [24].

In the current study, we defined 2 mm around the tumor as the peritumoral region and did not investigate further distance. There are several reasons. First, the capsular characteristics occur at the margin of the tumor, and even satellite nodules are mostly found within 2 mm from the central mass [19]. Thus, making 2 mm around the tumor sufficient for the evaluation of capsular characteristics. Additionally, Wu et al. [27] found that the radiomics model for laryngeal carcinoma will collapse when the distance was farther than 4.5 mm, which confirms that a larger peritumoral region does not necessarily lead to better performance and that the optimal distance should be set based on pathological characteristics. Secondly, the parotid gland is smaller in size compared to the stomach, lung, or breast and therefore a shorter distance of extension is needed to avoid going beyond the border of the parotid.

Our study found that the radiomics features from VOIintra-plus peritumor showed the best performance in predicting capsular characteristics of parotid PA. The most important features were those related to heterogeneity and complexity of the texture, such as Dependence Non-Uniformity, Run Length Non-Uniformity, and Gray Level Non-Uniformity. These features measure non-uniformity in image intensity [28,29,30]. Differences in capsular characteristics may lead to variations in the micro-structures of the peritumoral regions. In patients with a complete capsule, the peritumoral region is composed exclusively of parotid tissue. In contrast, those without a complete capsule have a peritumoral region containing both parotid and tumor tissues, resulting in increased heterogeneity. Furthermore, shape-based features, such as maximum 2D diameter, maximum 3D diameter, and major axis length, were also significant in the developed radiomics model. This is consistent with prior studies that have identified a relationship between tumor size and capsular integrity, with larger tumors being more likely to have an incomplete encapsulation [31].

Our study has some limitations. Firstly, a significant number of patients were lost to follow-up, so the actual recurrence rate could not be determined. Additionally, MRI images may provide more valuable information and have less artifact compared to CT images. It may be beneficial to explore the potential of MRI-based radiomics features in future studies. Lastly, our study population was limited to patients from two institutions, and the sample size was not large, so we still need to generalize the results to other populations. Furthermore, alternative advanced methods and semi-automatic workflows, such as the one offered by matRadiomics [32], could potentially offer better performance, and further improve the efficiency of the radiomics workflow in a routine clinical setting. This would allow for a more comprehensive assessment of the tumor’s microenvironment and its interactions with surrounding tissues, leading to more accurate predictive models.

In conclusion, our study demonstrated the potential of using peritumoral radiomics features for accurately detecting capsular morphological features of parotid PA. Despite some limitations, the results provide promising evidence for the application of this method in advancing precise treatment for patients with parotid PA. However, further systematic studies, including comparisons with MRI-based radiomics features, are necessary to fully validate the use of peritumoral radiomics in clinical practice.