Ultrasound-based deep learning radiomics nomogram for risk stratification of testicular masses: a two-center study

Objective To develop an ultrasound-driven clinical deep learning radiomics (CDLR) model for stratifying the risk of testicular masses, aiming to guide individualized treatment and minimize unnecessary procedures. Methods We retrospectively analyzed 275 patients with confirmed testicular lesions (January 2018 to April 2023) from two hospitals, split into training (158 cases), validation (68 cases), and external test cohorts (49 cases). Radiomics and deep learning (DL) features were extracted from preoperative ultrasound images. Following feature selection, we utilized logistic regression (LR) to establish a deep learning radiomics (DLR) model and subsequently derived its signature. Clinical data underwent univariate and multivariate LR analyses, forming the "clinic signature." By integrating the DLR and clinic signatures using multivariable LR, we formulated the CDLR nomogram for testicular mass risk stratification. The model’s efficacy was gauged using the area under the receiver operating characteristic curve (AUC), while its clinical utility was appraised with decision curve analysis(DCA). Additionally, we compared these models with two radiologists' assessments (5–8 years of practice). Results The CDLR nomogram showcased exceptional precision in distinguishing testicular tumors from non-tumorous lesions, registering AUCs of 0.909 (internal validation) and 0.835 (external validation). It also excelled in discerning malignant from benign testicular masses, posting AUCs of 0.851 (internal validation) and 0.834 (external validation). Notably, CDLR surpassed the clinical model, standalone DLR, and the evaluations of the two radiologists. Conclusion The CDLR nomogram offers a reliable tool for differentiating risks associated with testicular masses. It augments radiological diagnoses, facilitates personalized treatment approaches, and curtails unwarranted medical procedures. Supplementary Information The online version contains supplementary material available at 10.1007/s00432-023-05549-6.


Introduction
The incidence rate of testicular tumors, which account for approximately 1% of all male tumors and 5% of urinary system tumors, has increased in recent decades, particularly among young and middle-aged men (Park et al. 2018;Znaor et al. 2020;Gurney et al. 2019).The primary symptom is painless testicular enlargement.However, sometimes, they present with symptoms or imaging resembling orchitis, tuberculosis, or other tumor-like conditions, complicating clinical differential diagnosis (Belfield and Findlay-Line 2022;Tandstad et al. 2016).For non-neoplastic testicular lesions, conservative treatment is typically the first approach.However, testicular malignancies often require radical orchiectomy.Studies have shown that unilateral orchiectomy can result in infertility, sexual dysfunction, and reduced sexual function (Henriques et al. 2022;Kerie et al. 2021).Recently, some studies suggest that benign testicular tumors smaller than 2-3 cm in diameter can have a favorable prognosis with partial orchiectomy and adjuvant radiotherapy (Fankhauser et al. 2021;Paffenholz et al. 2018;Gentile et al. 2020;Sm et al. 2023).Thus, preoperative risk assessment of testicular masses is crucial.Accurately differentiating between malignant tumors, benign tumors, and non-neoplastic lesions before treatment ensures the best treatment plan for patients.This strategy prevents over-treatment and unnecessary complete resection, prioritizing the preservation of organ function.Ultrasound is essential in evaluating testicular lesions because of its cost-effectiveness, convenience, high reproducibility, and lack of radiation exposure (Minhas et al. 2021).It offers detailed information about a tumor's location, size, shape, and blood supply (Lai et al. 2023).However, the varied ultrasound characteristics of testicular masses can challenge diagnosis (Marko et al. 2017).
Radiomics technology, a recent advancement in clinical methods, is proving invaluable for diagnosing, selecting treatments, and assessing the prognosis of patients with tumors (Zhang et al. 2023).It utilizes quantitative analysis techniques to extract extensive lesion information from conventional medical images, conducting in-depth exploration and analysis of medical images to reveal hidden, intricate details within the images (Lafata et al. 2022).Earlier studies have investigated its use in predicting testicular and other urinary system diseases (Santi et al. 2022;Fan et al. 2022;Xue et al. 2023;Baessler et al. 2020).Lately, deep learning (DL) algorithms have gained widespread recognition and adoption in the field of medical image analysis (Beuque et al. 2023;Tong et al. 2022).DL employs neural networks for feature extraction, enabling automated image analysis posttraining-a significant advantage over radiomics.Scholars propose merging DL network output with radiomics features, potentially enhancing image-based radiomics' accuracy and reliability, especially with limited training datasets (Zhang et al. 2022).Among the DL algorithms, convolutional neural networks, with their inherent data-driven modeling capabilities, can directly extract task-related features from medical images, thereby significantly enhancing model accuracy and diagnostic efficiency (Yu et al. 2023;Dominique et al. 2022).Yet, there is a current gap in research that merges DL with ultrasound radiomics to predict the risk stratification of testicular masses.
Hence, we introduced two clinical deep learning radiomics (CDLR) nomograms to evaluate their capability in distinguishing between tumors and non-neoplastic lesions, and in differentiating malignant tumors from benign lesions.meet certain inclusion criteria.They must have undergone ultrasound examinations within 1 week before surgery, had full ultrasound images and clinical records pre-surgery, and received definitive postoperative pathological diagnoses.Exclusions involved cases with inferior ultrasound image clarity, no evident lesions, concurrent primary tumors elsewhere, or those who underwent neoadjuvant treatment before their ultrasound.Of the participants, 226 from Center 1 were randomly divided into the training (n = 158 patients) and validation (n = 68 patients) cohorts at a 7:3 ratio.The remaining 49 patients from Center 2 formed an external test cohort.The distribution of lesion pathology types is presented in Supplementary Table 1.An overview of our research process is depicted in Fig. 1.

Clinical data
The collated clinical information included demographics and health metrics such as age, body mass index (BMI), symptom (scrotal pain), existing medical conditions (e.g., hypertension, diabetes, coronary heart disease), complete blood count, serum alpha-fetoprotein (AFP) levels, serum beta-human chorionic gonadotropin (β-HCG) levels, and more.Radiological evaluations were conducted by experienced radiologists (with 5-8 years under their belts).They meticulously analyzed the ultrasound imagery, gauging lesion blood flow distribution through the Adler grading system.The blood flow was then categorized as either sparse (grades 0-1) or abundant (grades 2-3) based on color Doppler ultrasound readings (Adler et al. 1990;Ma et al. 2015).All clinical data was retrospectively retrieved from the hospital's HIS system.

Image acquisition
The equipment differed between the two centers.Center 1 utilized the ESAOTE-PLUS color Doppler ultrasound diagnostic equipment from Parkson Medical Company, boasting a high-frequency linear array probe with a 12-MHz frequency.By contrast, Center 2 implemented the Siemens Acuson Sequoia 512 color Doppler ultrasound diagnostic device, outfitted with a 10L4 linear array probe that covered Fig. 1 Patient selection process for this study depicted in a flowchart frequencies in the range of 2.9-9.9MHz.Skilled radiologists, each with more than 5 years of experience, captured the ultrasound images in both institutions.For uniformity, the most expansive cross-sectional lesion view was chosen and saved in the digital imaging and communications in medicine format, accumulating 275 images in total.All images were obtained from the hospital's picture archiving and communication system (PACS) and stored in digital imaging and communications in medicine (DICOM) format.

Image segmentation and feature extraction
We imported all images into ITK-SNAP software (version 3.8; http:// www.itksn ap.org).The region of interest (ROI) for each lesion was manually outlined along the edge of the lesion within the software by a radiologist with 5 years of experience.To ensure reliability, we evaluated the reproducibility of the outlined features using both intraclass and interclass correlation coefficients (ICCs).To do this, 30 images were selected at random.A radiologist with 8 years of experience outlined the ROI on these images and, after a week, repeated the process for intra-observer consistency assessment.The both radiologists were blinded to the patients' clinical information and pathology results.

Feature selection
For the training cohort, we employed a sequential approach to feature screening and dimensionality reduction: First, we retained radiomics features with an ICC exceeding 0.75 and integrated them with the DL features.Then, all selected features were regularized.Second, we applied the minimum redundancy maximum correlation algorithm to further refine feature selection.Finally, using the Least Absolute Shrinkage and Selection Operator (LASSO) regression model along with a tenfold cross-validation process, we identified and retained features with non-zero values.LASSO's inherent ability for powerful shrinkage and addressing multicollinearity significantly bolstered the accuracy of the model (Liu et al. 2023).

Establishment of DLR and clinical models
We used LR to construct our models.After the steps of feature screening and dimensionality reduction, we utilized the remaining features to create a DLR model, leading to the generation of a DLR signature.Additionally, single-factor LR analysis was conducted on the clinical characteristics of the training cohort for each variable.If a variable met the significance threshold of p < 0.05, it was chosen for multifactor LR analysis.This process enabled us to pinpoint critical predictive variables, facilitating the construction of a clinical model.From this, we derived the odds ratio (OR) and their 95% confidence intervals, resulting in a clinical signature.

Establishment of CDLR
Aiming to fuse clinical and imaging data to develop a precise, objective, and reliable decision-support model, we combined both the clinical and DLR signatures.Using multivariable LR analysis, a combined dimensional CDLR was formulated.For validation, two seasoned radiologists-with 5 and 8 years of ultrasound diagnostic experience-reviewed patient ultrasound images from the validation and test cohorts without knowledge of the pathology.They developed two separate ultrasound feature models, termed Model A and Model B. To gauge the efficacy of these models, receiver operating characteristic (ROC) curves were generated for the training, validation, and test cohorts.From these curves, we determined metrics including AUC, accuracy, sensitivity, specificity, positive predictive value, and negative predictive value.The Delong test was used to discern differences in AUC between models, with a significance level set at p < 0.05.This meticulous evaluation ensures the robustness of our CDLR as a decision-support tool.

Statistical analysis
For our statistical assessments, we leveraged several software tools, including SPSS software (version 26.0), R software (version 3.6.3;https:// www.r-proje ct.org), and Python software (version 3.5.6;http:// www.python.org).Descriptive statistics were conveyed as mean ± standard deviation.Differences between cohorts were identified using independent sample t-tests.When data displayed a skewed distribution (Q1, Q3), the Mann-Whitney U test was applied.Ratios for categorical variables were derived from the chi-square or Fisher's exact test, while skewed count data were subjected to rank sum tests.
Both univariate and multivariate LR analyses were performed, with a statistical significance threshold of p < 0.05.

Clinical characteristics
Our results can be found in Supplementary Table 2.The results indicated no significant differences among the training, validation, and external test cohorts (p < 0.05).In our study, the training, validation, and test cohorts consisted of 158, 68, and 49 patients, respectively.In Supplementary Table 3, within the training cohort, significant differences were observed in several parameters such as age, lymphocyte count (LYMPH), neutrophilto-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR), symptom, serum β-HCG, and AFP when comparing patients with testicular neoplastic lesions to those with non-neoplastic lesions (p < 0.05).Additionally, in Supplementary Table 4, there were distinct differences between benign and malignant testicular lesions in terms of symptom, serum AFP, β-HCG levels, and color Doppler blood flow signals (p < 0.05).

Construction and validation of DLR
To differentiate testicular tumors from non-tumor lesions, we used 7 radiomics features and 19 DL features to construct the DLR model, from which we derived the DLR signature (Fig. 3a and c, Fig. 4a, Supplementary Table 5).The DLR model's AUC values for the training, validation, and test cohorts were 0.954, 0.850, and 0.803, respectively (Fig. 5a, b and c).To distinguish between benign and malignant testicular lesions, we employed the same feature selection method, identifying 4 radiomics features and 20 DL features (Fig. 3b and d, Fig. 4b, Supplementary Table 5).This DLR model yielded AUCs of 0.894, 0.823, and 0.799 for the training, validation, and test cohorts, respectively (Fig. 5d, e and f).

Discussion
Our study indicates that the CDLR surpasses the clinical model, DLR, and radiologists with 5-8 years of experience in diagnosing testicular tumors and malignancies.CDLR can be a pivotal tool to support radiologists in imaging diagnosis    Correctly diagnosing testicular masses is vital, as treatments range from conservative measures to radical surgery.Overlooking a testicular malignancy diagnosis can cause treatment delays and poorer outcomes.For patients with benign testicular tumors, partial orchiectomy can conserve testicular function (Fankhauser et al. 2021;Paffenholz et al. 2018;Gentile et al. 2020;Sm et al. 2023).Conversely, unneeded surgical resections for those with non-neoplastic testicular lesions can adversely affect androgen levels, sexual function, fertility, among others (Henriques et al. 2022;Kerie et al. 2021).Hence, it's paramount to study and ascertain the nature of testicular masses to minimize unnecessary surgeries and reduce missed diagnoses of malignancies.To our understanding, we are the first research team to devise and authenticate CDLR nomograms to predict testicular mass risk stratification, targeting the identification of neoplastic lesions and malignancies.
Radiomics is a burgeoning non-invasive diagnostic method in medical imaging, which focuses on extracting a plethora of quantitative traits from comprehensive medical image data and leveraging this for diagnosis and forecasting.This approach is renowned for its objectivity, non-invasiveness, and data-mining capabilities, marking its potential in tumor diagnosis and treatment (Lambin et al. 2012;Guiot et al. 2022).DL is a formidable method in image analysis, facilitating the derivation of profound insights from image datasets.In our research, we employed deep transfer learning to draw DL attributes and merged them with radiomics traits to determine the nature of the masses.When predicting neoplastic lesions and malignant tumors, DL features stood out in terms of volume and significance among the chosen attributes.This observation underscores that DL technology can adeptly pinpoint key quantitative data mirroring the nature of the masses, thus becoming an indispensable tool for precise diagnoses.
Prior research has identified a link between pain and nonneoplastic lesions, with angiogenesis detected via color Doppler ultrasound emerging as a vital independent risk factor for malignancy (Liu et al. 2023).Tumor markers like AFP and β-HCG are instrumental in pinpointing testicular tumors (Esen et al. 2018).Our results concur with these findings; we recognized asymptomatic scrotal conditions and elevated serum AFP or β-HCG levels as standalone predictors of testicular tumors.Moreover, we identified asymptomatic scrotal conditions, increased serum AFP or β-HCG levels, and distinct blood flow signals via color Doppler ultrasound as independent predictors of testicular malignancy.Nevertheless, the accuracy of conventional ultrasound diagnosis for testicular tumors needs enhancement, currently hovering around 76.9% (Lung et al. 2020;Andipa et al. 2004).While contrast-enhanced ultrasound (CEUS) is a newer imaging modality, standard ultrasonography remains the go-to for diagnosing testicular masses (Schröder et al. 2016).The constraints of CEUS-such as the need for specialized expertise, higher costs, limited access, and potential contraindications linked to ultrasound contrast agents-have curbed its broad clinical adoption (Liu et al. 2017).In our research, the CDLR attained a commendable accuracy of 88.2%.Past studies emphasized the difficulty in differentiating benign from malignant testicular masses using only conventional ultrasound (Andiap et al. 2004).Fan et al. leveraged magnetic resonance imaging (MRI) volumetric apparent diffusion coefficient histogram analysis, attaining an AUC of 0.822 (Fan et al. 2020).They then integrated MRI imaging with machine learning, producing a prediction model for testicular masses with an AUC of 0.868 (Fan et al. 2022).The enhanced performance in these studies might stem from the extraction of richer features in the radiomics model.Yet, MRI comes with challenges: it's less sensitive to calcifications, has patient contraindications, is costlier, and has extended examination durations.Our CDLR, showcasing an AUC of 0.851 and an accuracy rate of 79.4%, underlines its significant advantages and potential in this arena.
In this study, the standalone DLR showcased superior performance compared to the clinical model, highlighting the importance of using deep image information from radiomics and DL to discern the features of testicular masses.When combined with clinical data, the CDLR displayed better predictive capabilities, surpassing even radiologists with 5-8 years of experience.This breakthrough can assist radiologists in precisely identifying testicular tumors and malignancies.
However, this study has some limitations.Firstly, as a retrospective study, selection bias and errors are unavoidable.For instance, if ultrasound examinations are conducted by different doctors, subjective errors might arise when selecting the maximum diameter section of the tumor.Secondly, while our study combines clinical characteristics, imaging, and modeling of radiomics and DL features, it doesn't include other imaging techniques such as contrast-enhanced ultrasound and elastography for comparison or multi-modal fusion.Lastly, defining the ROI boundary might introduce researcher subjectivity.We anticipate using DL technology for automatic identification and delineation of ROI in the future, and we plan on conducting prospective, multicenter studies to further validate our proposed model.

Conclusion
The clinical-deep learning ultrasound radiomics nomogram introduced in this study produced encouraging results in predicting testicular tumors and malignancies.It even outperformed radiologists with between 3 and 8 years of professional experience.This is crucial for early patient diagnosis, treatment planning, and surgical method decision-making.It can help prevent unnecessary testicle removal or damage to testicular function from excessive medical intervention, providing solid backing for achieving precise tumor treatment goals.

Fig. 2
Fig. 2 Study workflow of the clinical, DLR, and CDLR models for the risk stratification of testicular masses.DLR deep learning radiomics; CDLR clinical deep learning radiomics

Fig. 3
Fig. 3 LASSO, paired with ten-fold cross-validation, was employed to screen both radiomics features and DL features for predicting testicular tumors and malignancies.a, b Show the coefficients of radiomics and DL features obtained from LASSO with ten-fold cross-

Fig. 4
Fig. 4 a, b Coefficients of the filtered radiomics features and DL features.DL deep learning

Fig. 5
Fig. 5 ROC curves comparing different models.a-c ROC curves comparing the clinical, DLR, and CDLR models for predicting testicular tumors across the training, validation, and test cohorts.d-f ROC curves comparing the clinical, DLR, and CDLR models for

Fig. 8
Fig.8Convolutional neural network activation maps used for identifying testicular non-neoplastic lesions, benign tumors, and malignant tumors.The red regions on these maps highlight areas that correlate with the nature of the mass

Table 1
Comparison of diagnostic performances of different models for discriminating testicular neoplasm from non-neoplasm in the training, validation, and test cohorts CI confidence interval; DLR deep learning radiomics model; CDLR clinical deep learning radiomics nomogram; PPV positive predictive value; NPV negative predictive value *p value of AUCs difer between the model and CDLR in the training cohort # p value of AUCs difer between the model and CDLR in the validation cohor ^ p value of AUCs difer between the model and CDLR in the test cohort

Table 2
Comparison of diagnostic performances of different models or discriminating testicular malignant tumor from benign lesions in the training, validation, and test cohorts CI confidence interval; DLR deep learning radiomics model; CDLR clinical deep learning radiomics nomogram; PPV positive predictive value; NPV negative predictive value *p value of AUCs difer between the model and CDLR in the training cohort # p value of AUCs difer between the model and CDLR in the validation cohor ^ p value of AUCs difer between the model and CDLR in the test cohort