Background

Gastric cancer (GC) is the third deadliest malignancy, with over 1 million newly diagnosed cases and approximately 76,900 deaths in 2020, causing a massive burden worldwide [1, 2]. Distant metastasis is one of the major factors contributing to high mortality in GC. Ovarian metastasis, a unique kind of distant metastasis in females, is not rare in gastric cancer, especially in Asian countries [3]. For example, in Japan, metastases account for almost 20% of all ovarian malignancies, and the most common primary site is the stomach [4]. In clinical studies, the reported incidence of ovarian metastasis ranged from 0.3% to 6.7% in female GC patients. However, according to autopsy studies, an incidence of 33–41% was found for ovarian metastasis [5]. The inconsistency between these clinical and autopsy studies indicated that the incidence of ovarian metastasis might be underestimated in real-world practice.

Ovarian metastasis (OM) in GC patients is correlated with poor prognosis, showing a median overall survival of 11 months [6, 7]. Moreover, synchronous ovarian metastasis (SOM), defined as OM observed within 6 months of the first GC diagnosis, exhibits shorter overall survival compared with metachronous ovarian metastasis [4, 8]. In female GC patients, preoperative detection of SOM is essential for tumor staging and treatment decision making, providing baseline information for individual treatment. When SOM is detected preoperatively, the role of aggressive surgical approach remains controversial, and multidisciplinary team (MDT) discussion is required for individual treatment strategies. Improving outcomes in patients affected by metastatic GC represents an urgent clinical need. Several novel therapies are under investigation, including margetuximab, HER2-targeted therapies, and immunotherapy [9,10,11,12].

Precise preoperative diagnosis of SOM is challenging in clinical practice. Indeed, the symptoms of ovarian metastasis are variable and nonspecific. Although ultrasound, CT and MRI are useful for detecting ovarian masses, the imaging features of ovarian metastases sometimes may be atypical and misleading [7]. When occurring unilaterally, ovary metastasis could be hardly differentiated from primary ovarian tumors. Moreover, in some cases, an ovarian mass constitutes the initial sign of cancer. About 7% of ovarian tumors that present as primary ovarian neoplasms are known to be ovarian metastases [13]. In case the possibility of metastatic carcinoma is not considered, treatment options may be adversely affected. On account of management differences and prognostic implications, invasive histopathological approaches such as biopsy and exploratory laparotomy are often required for a confident diagnosis [13].

In the past decades, researchers have focused on the clinical characteristics, treatment methods and prognostic analysis of ovarian metastasis [8,9,10,11,12,13,14,15,16]. Studies have determined the risk factors for ovarian metastasis in GC patients; however, there have been no reliable models applied to clinical practice so far. Gao et al.reported premenopausal status, tumor invasion depth, number of positive lymph nodes, and no ERβ expression as factors independently predicting metachronous ovarian metastasis [17]. Li et al.constructed a nomogram including age, N stage, Lauren type, signet-ring cell component, estrogen receptor expression, neutrophil/lymphocyte ratio, and serum CA125 for predicting ovarian metastasis in GC, with an area under the curve (AUC) for the model of 0.819 [18]. However, both synchronous and metachronous ovarian metastases were involved in this study, which may not mimic the real-world setting. SOMs need to be detected preoperatively, and a reliable and noninvasive method is required to detect SOM in GC patients.

Recently, several reports have demonstrated radiomics may help radiologists solve tough clinical tasks. By extracting numerous quantitative features from medical images via high-throughput analysis, radiomics could enable radiologists to improve diagnostic accuracy, which eventually would benefit patients. Radiomics-based models have shown a promising value in the detection of occult peritoneal metastasis and lymph node metastasis in GC [19,20,21,22,23,24].

To our knowledge, a radiomics nomogram for the detection of SOM in GC patients has not been developed. Therefore, the aim of the present work was to build a CT-based radiomics nomogram model for preoperative detection of SOM and to evaluate its clinical application in female GC patients.

Methods

Participants

The trial followed the Declaration of Helsinki and had approval from the Ethics Committees of Changhai hospital and Ruijin Hospital Luwan Branch. Due to a retrospective design, signed informed consent was not required.

From January 2019 to March 2022, 174 females with GC detected pathologically at Changhai Hospital were enrolled in this retrospective trial. Inclusion criteria were: (1) gastric adenocarcinoma diagnosed by biopsy or postoperative pathological examination; (2) gastric adenocarcinoma as single focus; (3) both abdominal contrast-enhanced CT examination and pelvis contrast-enhanced MR examination performed at the time of diagnosis; (4) pathological examinations performed for any suspicious ovarian lesions detected by MRI. Exclusion criteria were: (1) local or systemic treatment prior to baseline CT scan (n = 46); (2) previously diagnosed or concurrent cancers other than GC (n = 3); (3) poor image quality (n = 7); (4) metachronous ovarian metastasis (n = 12); (5) clinically suspected SOM but not pathologically confirmed (n = 5). Therefore, 101 cases were finally included for final analysis as cohort 1. Then, 46 patients meeting the eligible criteria in Ruijin Hospital Luwan Branch between January 2021 and March 2022 were also enrolled as cohort 2.

Clinicopathologic data

Patient information and clinical findings were retrospectively retrieved from the clinicopathological databases, e.g., age, tumor location (including the upper third, middle third and lower third of the stomach), carcinoembryonic antigen (CEA), carbohydrate antigen 19–9 (CA19-9), carbohydrate antigen 125 (CA125) and carbohydrate antigen 72–4 (CA 724), were recorded at the same time as CT scans (time interval < 2 weeks). All cases were pathologically confirmed as GC, then categorized into 2 groups, including the SOM and no-SOM groups. All pathological SOMs extracted from surgical specimens were confirmed by pathological findings.

Image acquisition and analysis

Routine contrast-enhanced abdominal CT was performed on a multidetector row CT (MDCT) system (Aquilion, TOSHIBA, Japan; iCT256, PHILIPS, Netherlands) after 4 h of fasting. Supine patients were intravenously injected an iodinated contrast agent at 80–95 ml (Optiray, Liebel-Flarsheim Canada, Canada) at 3.0 or 3.5 ml/s. This was followed by arterial and portal venous phase contrast-enhanced CT after delays of 28 s and 50 s, respectively. CT images were acquired at 120 kV, 100 to 150 mA and 0.5 s rotation time. Contrast-enhanced CT images were reconstructed with the following parameters: field of view, 350 × 350 mm; data matrix, 512 × 512; in-plane spatial resolution, 0.6 mm; axial slice thickness, 5.0 mm; spiral pitch, 1.

Subjective evaluation for SOM was performed by 3 radiologists with systematic training, including QW. Z., PP. Y. and F. S. with 8, 9 and 12 years of experience in CT diagnosis, respectively, who had no knowledge of pathological data. Any discrepancy among them was discussed until an agreement was reached by at least 2 of these experts. The Kappa statistic was used to evaluate interobserver correlation between two given radiologists. Intraclass correlation coefficient (ICC) was determined for evaluating consistency among the three radiologists.

Image segmentation

The acquired DICOM data (portal venous phase CT scans) were preprocessed with the Artificial Intelligence Kit software (AK, GE Healthcare, China). Images were resampled (using Bspline as the default interpolator) and normalized for subsequent radiomics analysis (using default values). Then, the preprocessed images were imported into the ITK-SNAP software (www.itksnap.org) to manually segment the entire GC tissue layer by layer to obtain the volume of interest (VOI), which reflects the border best fitting the lesion’s area for each GC case. Two radiologists (QW. Z. and PP. Y.) independently repeated the segmentation process in 30 randomly selected patients a week later to analyze observer’s agreement. Finally, all VOIs were imported into the AK software for feature extraction.

Radiomics feature extraction and reduction

According to the obtained VOIs, 4 categories of features were determined, including first-order feature (voxel intensity distribution on CT images), shape feature (3D properties of the VOIs), texture feature (quantitation of region heterogeneity differences, e.g., gray-level co-occurrence, run length, size zone and neighborhood gray-tone difference matrices) and higher-order feature (transformed first-order data and texture features, including logarithm, exponential, gradient, square, square root, local binary pattern [LBP] and wavelet transformations) groups. Totally 1218 radiomics features were obtained in every patient.

Inter- and intra-observer correlation coefficients (ICCs) were obtained for the assessment of feature robustness. Features with both inter- and intra-observer correlation coefficients above 0.9 were employed to build the model, with outstanding feature reproducibility. To select optimal features associated with SOM, the least absolute shrinkage and selection operator (LASSO) algorithm was utilized (ten-fold cross-validation). The selected features were employed to develop a radscore.

Nomogram model building and validation

The predictive values of clinical features and the radscore in the detection of SOM were evaluated by univariable logistic regression analysis in the training set (cohort 1). Factors showing p < 0.05 were then used to generate a visual nomogram model by multiple factor logistic regression with the stepwise selection method (p < 0.05). The bootstrap method (1000 cross-validation) was performed to validate the precision of detection [25] as an internal validation tool in cohort 1. Receiver operating characteristic (ROC) curve analysis was carried out to assess the performances of the radscore, nomogram and subjective evaluation model. Then, the external validation data set (cohort 2) was used for verification. Finally, the models were compared by the DeLong test, and the nomogram’s goodness-of-fit was determined using the Hosmer–Lemeshow test. Decision curve analysis (DCA) and the confusion matrix were utilized to validate clinical benefits. Figure 1 shows the study’s workflow.

Fig. 1
figure 1

Study flowchart and modeling methods. Study flowchart (A) and modeling methods (B). Cohort 1, Changhai Hospital; Cohort 2, Ruijin Hospital Luwan

Statistical Analysis

MedCalc 15.2.2 and Python 3.5 were utilized for data analysis. Categorical data were compared by the Pearson chi-square test or Fisher’s exact test, whereas continuous data (mean ± standard deviation) were compared by the Student’s t-test or Mann–Whitney U test. Two-sided P < 0.05 was deemed to be statistically significant.

Results

Patient features

Totally 101 cases were finally enrolled in cohort 1 and 46 patients were included in cohort 2. The two datasets had no marked differences in demographic characteristics (all P > 0.05), as shown in Table 1. According to pathological reports, 38/101 (37.6%) cases were SOM in cohort 1, versus 18/46 (39.1%) in cohort 2. Interobserver agreement for the subjective evaluation in both cohorts is shown in Supplemental Table 1.

Table 1 Clinicopathological parameters of the examined cohorts

Radiomics features and selection

After inter- and intra-observer agreement analyses, 921/1218 (75.6%) features showed inter- and intra-observer ICCs ≥ 0.9, and were used for radiomics analysis. Eventually, two features were determined by the LASSO algorithm to be optimal (Supplemental Fig. 1), and a radscore was built as follows:

$$\mathrm{Radscore}=0.0122029783\ast(\mathrm{original\_shape\_Maximum}3\mathrm{DDiameter})+0.0002702978\ast(\mathrm{original\_glrlm\_ShortRunHighGrayLevelEmphasis}).$$

Nomogram model construction and evaluation

In cohort 1, univariable analysis showed age, menstrual status, tumor location, CA125, CA125/CEA, CA724, CA19-9 and radscore were significantly associated with SOM. Next, a nomogram model was developed by multivariable logistic regression analysis of select risk factors (Age, OR = 0.884, p < 0.01; radscore, OR = 17.222, p = 0.001, Table 2 and Fig. 2A-B). The generated nomogram, depicted in Fig. 2C, had a higher AUC than the radscore and subjective evaluation (0.910 vs 0.827 vs 0.773) in the training set. The DeLong test revealed the differences were statistically significant (p = 0.007, p = 0.026). The bootstrap algorithm also showed a satisfactory performance for SOM, with an AUC of 0.907 in 1000 cross-validation, as an internal validation tool.

Table 2 Regression analysis for model building
Fig. 2
figure 2

Model building. Fitting curves for the radscore (A) and age (B) are shown. In the visual nomogram (C), first, a vertical line was drawn according to the values of the most influential factors to determine the corresponding numbers of points. The total points were the sum of the above points. Then, a vertical line was drawn according to the value of total points to determine the probability of risk

In cohort 2 (external validation set), the nomogram model in combination with the radscore and age (details listed in Supplemental Table 2) had a higher AUC than the radscore and subjective evaluation (0.850 vs 0.790 vs 0.675). The detailed ROC analyses and comparisons are shown in Table 3 and Fig. 3. Calibration curves for the nomogram in both datasets suggested no significant deviation (Hosmer–Lemeshow test, all P > 0.05) from an ideal fitting (Fig. 4).

Table 3 ROC curve analysis and comparison of predictive values
Fig. 3
figure 3

ROC curves in both data sets. A Training set. B External validation set

Fig. 4
figure 4

The calibration curves of the nomogram in both data sets. A Training set. B External validation set

In the external validation set, DCA showed that employing the novel nomogram to predict the probability of SOM conferred a positive net benefit compared to the radscore and the all-or-none scheme at a threshold probability from 10 to 75% (Fig. 5A). The confusion matrix confirmed the nomogram’s superiority over the radscore model (Fig. 5B-C).

Fig. 5
figure 5

Model validation. A DCA in the external validation set. Light- and dark-grey lines represent the assumptions that all and no cases have a high risk, respectively. Red and blue curves showed that with a large probability range, utilizing the developed nomogram to predict the odds of SOM conferred a positive net benefit versus the radscore and the all-or-none scheme. The confusion matrix showed that using the nomogram model (C) would be more beneficial than applying the radscore alone (B)

Discussion

This work developed a radiomics nomogram using CT and clinical data, which showed markedly enhanced power versus the radscore and subjective evaluation in detecting SOM in female GC patients. Clinicians can use this effective, noninvasive radiomics approach to improve the screening accuracy of SOM preoperatively and to design individuated treatments.

Detection of SOM in GC patients is important for clinical decision-making. Ovarian metastasis (OM) reduces prognosis in female GC cases and often results in failed treatment [26]. Despite recent advances in diagnostic and therapeutic tools for GC, GC cases with OM still show unsatisfactory prognosis, with median survival time of less than 15 months [7]. When SOM is diagnosed, the GC case has stage IV disease (cM1), with poor prognosis. Such patients are not eligible for curative surgery, and the treatment strategy may mainly include systemic therapy and chemoradiation, with the treatment goals being symptom relief and delayed progression. If a GC patient is diagnosed with primary ovary neoplasms, GC treatment is not affected, and treatment of ovary neoplasms may be evaluated by an obstetrician-gynecologist.

Timely detection and precise evaluation of SOM is fundamental to optimal therapeutic decision-making. According to the NCCN guidelines for gastric cancer (version 2.2022) [2], pelvis imaging evaluation should be performed for newly diagnosed patients for a thorough assessment. However, the imaging appearances of ovarian metastasis and primary ovarian neoplasms may overlap, and the correct diagnosis can be challenging for radiologists [27]. Typically, ovarian metastases mostly occur in premenopausal women and present as bilateral masses. Peritoneal metastasis could also be observed in multiple patients simultaneously. These features as well as the medical history are key clues for the diagnosis of ovarian metastasis. However, in a retrospective study, Lin et al. [8] reported that 29.2% of ovarian metastases were unilateral and 31.5% showed no peritoneal metastasis. When presented atypically, OM might be misinterpreted as a primary ovarian neoplasm or even physiologic ovarian enlargement. In a study by de Waal et al., about 25% of ovarian metastases mimicked a primary ovarian tumor [16]. Feng et al. [15] reported that 6 of 63 patients had erroneous diagnosis as physiologic ovarian enlargement by imaging modalities and received no timely treatment.

Many studies have assessed the clinicopathological characteristics and prognostic factors of OM; however, imaging diagnosis of SOM sometimes remains challenging for radiologists. Routine imaging modalities, including CT, ultrasound and MRI, are unsatisfactory in discriminating between primary and secondary ovary tumors. Because the traditional imaging features of SOM are non-specific and may lead to misdiagnosis, such evaluation is obviously a subjective process that lacks reliability. Therefore, the above facts highlight the urgent need to develop reliable tools to correctly identify GC cases with SOM and to improve prognosis.

Radiomics constitutes a new strategy using routine imaging findings to perform a high-throughput quantitative evaluation. This quantitative method provides a noninvasive tool for a more comprehensive assessment of the biological properties and heterogeneity of GC compared with morphological visual representation. It is widely admitted radiomics can be applied in GC evaluation [19, 21, 23, 28,29,30]. However, radiomics is scarcely applied for detecting SOM in GC.

Importantly, this study relied on the application of radiomics features derived from primary lesions. Though the routine use of pelvic CT/MRI in clinic could help detect ovarian masses, the imaging features of ovarian metastases sometimes can be atypical and misleading. This problem is aggravated by the lack of consensus on appropriate morphological criteria to assess OM involvement accurately. Compared to routine approaches, the radiomics approach is convenient, inexpensive, and free from risk of secondary inspection. The radiomics features extracted from primary tumor could directly predict the presence of SOM, no matter whether the ovarian region has benign lesions. Therefore, it could be utilized easily for identifying low-risk patients who may not benefit from further radical imaging examination, such as PET/CT, which may reduce the radiation exposure and save expenses.

It is noteworthy that we built a nomogram combining age and CT-based radscore, which constitutes a visualization tool with improved discriminatory ability for SOM detection. A nomogram was generated to help radiologists and clinicians assess SOM more easily. This model showed favorable performance and better diagnostic efficiency than subjective evaluation and the radscore alone. The AUC improved from 0.675 to 0.850, with a pretty higher sensitivity and accuracy of 0.833 and 0.826 in the validation cohort. Thus, the proposed nomogram could be clinically applied to promote risk stratification in patients with GC.

Another valuable aspect of the present work that an actual external validation cohort was examined. The external validation set in the current study revealed improved diagnostic performance and better clinical benefit with the use of the novel nomogram. This indicates utilizing an external set may help overcome the shortcoming of overfitting for a newly built model. Therefore, the new model may assist radiologists in improving diagnostic confidence and provide clinicians with more useful and objective understanding of overall prognostic factors beforehand in the clinical setting.

This study had limitations. First, it had a small sample and a retrospective design, indicating potential selection bias and reduced data generalizability. Therefore, larger multicenter studies with external validation cohorts are needed to address these shortcomings. In addition, imaging segmentation was performed manually rather than semi-automatically or automatically, which may result in subjective errors not suitable for large data processing. Compared to routine manual approach which often insufficiently systematic and cumbersome procedure, the deep learning-based automatic segmentation may thus help alleviate this burden and can effectively improve research reliability [31]. Additionally, imaging segmentation was performed driven from primary GC. Although most methodologies advocate the use of volume of the whole primary tumor, only the radiomics features of the primary tumor were extracted and analyzed, and the features of the OMs themselves were not explored, which may result in incomplete observation data. This point attracted a great attention both in theoretical and application fields. Furthermore, deep learning tools were not developed and validated for the detection of SOM or even peritoneal metastasis [32]. This application of artificial intelligence method could be used to guide personalized treatment plans with the help of computerized tumor-level characterization [33]. Finally, this study did not include relevant molecular biological indicators. “Radiogenomics” includes the radiomics and genomics features represents an emerging prognostic approach [34], which should be addressed in future studies.

Conclusions

In conclusion, using preoperative CT images, a quantitative radscore was built to determine the risk of SOM in GC patients. Then, a nomogram model combining the radscore and clinical characteristics could be applied to improve the clinical benefit versus the subjective evaluation and the radscore alone. This visual noninvasive nomogram approach could be clinically applied to promote risk stratification in GC.