Introduction

Pancreatic ductal adenocarcinoma (PDAC) is a malignancy with poor prognosis, frequently presenting at an advanced stage, with a 5-year survival rate of only 11% [1, 2]. Despite advancements in diagnostic and staging technologies such as CT and MR, gains in detection and outcomes have been minimal over the last decades, highlighting the need for more effective and optimised PDAC management.

Radiomics, a novel and promising higher computational method, involves extracting so-called radiomic features (RFs) from medical images that are not discernible by the human eye. The discovery of non-invasive RF-based imaging biomarkers could potentially enable better staging, and lead to improved response to treatment and overall survival as part of precision/personalised medicine [3,4,5].

The extraction of RFs from regions of interest (ROIs) in two (2D) or three dimensions (3D) is common practice in image analysis. RFs can be broadly classified into first, second, and higher order features: First-order features include shape/sphericity, voxel grey intensity, and coarse voxel distribution which can be represented in a histogram that demonstrates skewness, kurtosis, uniformity, and entropy. Second-order features describe the intensity relationships between neighbouring voxels and include characteristics such as grey-level co-occurrence matrix (GLCM) and grey-level run length matrix (GLRLM) [3, 6]. Lastly, higher order features are extracted through mathematical modulation, or filtering techniques, with the goal of supressing noise or highlighting details and patterns [7].

Despite this vast scope, compared to other solid organ cancers, such as the liver and lungs, PDAC radiomics implementation into clinical practice has been limited [8,9,10]. This is due to a number of challenges, including difficulties in developing a reliable study design. As shown by Yamashita et al [11], the reproducibility of radiomic models is highly impacted by variations in scanning parameters, such as scanner model used, pixel spacing, and the contrast administration rate, as well as the manual segmentation of ROIs.

Additionally, there are shortcomings in research methodologies, such as data harmonisation and the use of software that adheres to the Image Biomarker Standardisation Initiative (IBSI) guidelines, within an already-complex clinical environment [7, 12, 13].

This review aims to provide an overview of the current state of primary research in PDAC diagnosis, treatment and prognosis, with a particular focus on radiomics and applied methodology. It highlights areas of strength and weakness in the field, with an emphasis on reproducibility and offers guidance to radiomic researchers to generate more robust results.

Material and methods

Database search (MEDLINE, PubMed, and Scopus)

A literature search of the online databases MEDLINE, PubMed, and Scopus was conducted between June and August 2022. The search formula contained [pancreas OR pancreatic] AND [radiomic OR [quantitative AND imaging] OR [texture AND analysis]].

Titles and abstracts of articles were initially screened by two raters (H.S.K (radiology subspecialist with 16 years abdominal specialty experience), J.A.M (3rd year medical graduate student)). Inclusion criteria included primary human research articles on CT radiomics in PDAC diagnosis, treatment, and/or prognosis published in English between 2017 and August 2022, and studies using non IBSI-compliant software were excluded. Final selection of articles were then analysed by three raters (H.S.K, J.A.M, M.T. (biostatistician)) (Table 1).

Table 1 Inclusion and exclusion assessment criteria performed after initial MEDLINE, PubMed, and Scopus literature search (June–August 2022)

Data extraction

Articles were evaluated and categorised according to CT slice thickness (≤ 1 mm, > 1 to 3 mm, > 3 to 5 mm) so to align with prevalent standards and enable comparison of future radiomic investigations. Studies were further subcategorised into various RF clinical applications to address the following research questions: (1) Are there commonly identified RFs across PDAC studies that suggest trends in development of a validated imaging biomarker? (2) What is the reported cohort size, CT technical factors, and described radiomics methodology steps? Are there factors in methods that could impede reproducibility?

Our analysis did not to apply the radiomics quality score (RQS) by Lambin et al [14] given its complex structure (16 components, six key domains, score of 0–36 points) and also based on a recent extensive systematic review comprising 77 articles in high ranking medical journals that demonstrated an overall basic low RQS adherence rate at 38.7%, with many RQS components receiving a score of 0 points [15].

Results

Figure 1 illustrates the PRISMA flow diagram outlining the literature search. A total of 1112 articles were found (MEDLINE n = 584, PubMed n = 144, Scopus n = 384), with duplicates removed resulting in a total of 650 articles. The initial screening of titles and abstracts identified 49 articles that met the eligibility criteria for full-text assessment. After further exclusions, a total of 12 articles were included in this review (Table 2).

Fig. 1
figure 1

PRISMA flow diagram of MEDLINE, PubMed, and Scopus literature search. Abbreviations: IBSI, Image Biomarker Standardisation Initiative; PDAC, pancreatic ductal adenocarcinoma

Table 2 Included full-text articles on radiomics in PDAC in this review (in alphabetical order)

Full-text analysis revealed a lack of common RFs, highly variable methodologies, and a lack of sufficient information to ensure reproducibility (Table 3).

Table 3 Missing detailed methodology steps impeding reproducibility

Cohort size and CT technique

Patient cohort sizes ranged from 37 to 352 (median = 106, mean = 155.8). Three of the 12 selected studies for review had a small cohort size ranging from 37 to 54 [16,17,18].

Four studies showed a CT slice thickness of ≤ 1 mm [16, 17, 19, 20], five a thickness of > 1–3 mm [18, 21,22,23,24], two a thickness of > 3 to 5 mm [25, 26], and one had no slice thickness specified [27] (Table 3). Nine of the 12 studies used single-centre data [16,17,18, 20,21,22,23,24,25], 2 studies used data from 2 different centres (one each as a training cohort, one each a validation cohort) [26, 27], and one study used data from 5 centres [19].

A CT pancreas protocol was utilised in 5 out of 12 studies [17, 18, 22, 26, 27], while no pancreas protocol was applied in 5 out of 12 studies [19,20,21, 24, 25], and 2 studies did not provide information on whether patients underwent a CT pancreas protocol [16, 23]. CT contrast phase used for segmentation varied among studies, with five studies using the pv-phase [16, 17, 19, 20, 26], two studies using the late arterial phase [18, 24], four studies using multiple phases [22, 23, 25, 27], and one study using no contrast agent [21].

Radiomic feature extraction and selection

RF selection and analysis methodologies were reproducible, albeit highly variable. In three studies, RFs were chosen a priori [18, 20, 23], while the remaining nine studies extracted features from the respective software libraries. Software used was PyRadiomics [19, 20, 26] and LIFEx in three [17, 21, 23], IBEX [18, 24] and MATLAB in two [22, 27], and AnalysisKit [16] and Artificial Intelligent Kit in one study [25].

Statistical analysis, and model development

Analysed outcome variables applied either time-to-event endpoints modelled using Cox-regression techniques [17,18,19,20,21,22, 25, 26] or logistic regression models for assessment of PDAC grades [23, 24], local response [18, 21], or superior mesenteric-portal vein invasion [16].

When selecting RFs as part of their models, two studies included RFs as part of their multivariable analysis with p-values of < 0.1 for significance [17, 22], while the remaining studies either did not explicitly state any variations, or it was assumed that p < 0.05 was considered the threshold. Two studies did not disclose details of the specific RFs chosen [22, 24]. Additionally, one of these studies did not disclose the CT phase associated with the chosen features [22].

Many studies provided details of correlation analysis to reduce the number of features considered in univariable or multivariable modelling [16, 17, 20, 25, 27] and to assess for collinearity between variables [18, 28] upon multivariable modelling. Of the 12 studies, half used least absolute shrinkage and selection operator (LASSO) techniques [19, 23,24,25,26,27]. Step-wise backwards screening methods for statistically significant variable selection was used by one study [16], while elastic net regularisation was also considered [21]. A few studies did not explicitly provide descriptions of the methods of the variable selection in the multivariable prediction model analysis [17, 18, 20, 22]. Only one study indicated the application of a two-way interaction calculation [19].

A combination of receiver operating curve (ROC) or the Harrell’s C-concordance index analysis was applied in all but two studies [20, 26]. Validation and calibration statistical parameters such as Akaike information criteria and calibration curves and/or Hosmer–Lemeshow tests were utilised to evaluate model performance and fit in most studies, with three exceptions [18, 20, 26]. Several papers specified the consideration of categorical feature variables via dichotomisation based on medians [19], Youden index from ROC analysis [24], or other criteria [21].

Half of the reviewed studies [16, 19, 23, 25,26,27] reported conducting an assessment of inter-observer reliability on the selected RFs for analysis prior to consideration of multivariable modelling.

Two studies detailed the handling of missing data in their modelling analysis [19, 20], and commented on the reasons why this was undertaken.

Three studies acknowledged that internal validation cohorts were not considered due to insufficient sample size [16, 18, 23].

R statistical software was used for all selected studies except for one which utilised SPSS [24].

Discussion

Radiomics is a computational method of extracting features (RF) from medical images that has the potential to develop non-invasive imaging biomarkers aiding in improved PDAC delineation and treatment. To date, there is limited PDAC CT radiomics primary research data available.

Our analysis revealed several challenges associated with the use of retrospective data from heterogeneous small- to moderately sized cohorts. Furthermore, there was a lack agreement of RFs deemed significant. Additionally, variations in CT techniques and lack of consistency in RF segmentation and selection further hindered reproducibility of findings. However, it is noteworthy that GLCM-associated RFs were observed in 6 of the 12 studies reviewed, albeit without any consistent subcategories.

The median cohort size was 106 patients, with 7 out of the 12 studies having such small cohort sizes rendering a validation process impossible. Training-validation-ratio of cohorts is an important factor in ensuring a robust prognostic model. Training-validation-ratios of medical studies usually range from 67:33 to 80:20 [29]. The study by Khalvati et al [26] demonstrated a training:validation ratio of 30:68 which would necessitate caution when interpreting results. Further statistical considerations include the overfitting of a complex radiomic model and the limitation of Bonferroni corrections which are not applicable when the sample size is too small [18]. Overfitting can be somewhat mitigated by pre-selecting validated RFs; however, demonstration of sufficient power to develop a prognostic model and proper validated studies is scarce as shown in this study.

Another important factor to consider in radiomic studies is variation in CT image acquisition. CT scanning and scanner parameters play a significant role in PDAC staging and influence the performance of radiomic models [11]. Utilising a standardised CT protocol, particularly one that includes a late arterial and portal venous phase as described by the NCCN criteria, would render studies more comparable [30,31,32]. Five of the 12 studies noted the use of a CT pancreas protocol while 6 out of the 7 remaining studies did not specify the reasoning or acknowledge this limitation. Healy et al [19] did not capture CT scanning protocol variances aiming to develop and validate their radiomics model under “real-world circumstances”. This bears the risk of RF extraction from varied, heterogeneous datasets that may compromise the interpretation of results. As proposed by Noda et al [33], utilising a single portal-venous dual-energy computed tomography image in lieu of a specific CT pancreas protocol may serve as potential solution to standardise image acquisition for radiomics while still maintaining the CT pancreas protocol as gold standard for PDAC staging.

CT technique can greatly impact image quality, and tumour and vessel conspicuity, and ultimately affect radiomics analysis. Studies such as He et al [34] have demonstrated that factors such as contrast enhancement, slice thickness, and convolution kernel reconstruction impact on performance of radiomics models. In order to ensure reproducibility in radiomics studies, data harmonisation is a crucial step. This can include image resampling to maintain a consistent slice thickness, voxel size, and pixel grey intensity ranges (grey-level discretisation) [35]. However, data harmonisation is commonly overlooked, as demonstrated in studies such as Healy et al [19], where a large and heterogeneous cohort was used with multiple CT scanner types and scanning protocols over a larger period of time during which CT scanning imaging techniques were rapidly evolving. The authors state that CT imaging data was harmonised to 1-mm-slice thickness without specifying the original imaging slice thickness data. Given that patients in this cohort were recruited from as early as 2005, it appears unlikely that the original CT slice thickness was ≤ 1 mm, implying that the data was likely not harmonised as intended (i.e., reformatting a 5-mm slice thickness into 5 × 1-mm slice thicknesses). Other studies, such as those by Salinas-Miranda et al [20] and Cai et al [27] have used the same patient cohort with similar harmonisation methods, thus failing to report CT slice thickness. Furthermore, the study by Khalvati et al [26] used different CT slice thicknesses in their training and validation cohorts (5 mm and 2 mm respectively) without acknowledging the impact on radiomic analysis.

A recent endeavour to streamline radiomics analysis and to enhance RF reproducibility is the IBSI [12]. The 160 + pages framework places a significant emphasis on mathematical and technical aspects, while giving less attention to the clinical implementation. This is likely due to the fact that the majority of authors are of non-clinical backgrounds. As a result, clinicians have to trust and rely on IBSI-compliant radiomics software for quality assurance and reproducibility purposes. This is also underpinned in a study by Fornacon-Wood et al, which found a higher number of RFs exhibiting excellent statistical reliability when extracted using IBSI-compliant software, as opposed to non-compliant software [13]. In our initial eligibility assessment, we identified 9 of 49 studies that used non-IBSI compliant software, and as a result, these studies were excluded from our final analysis.

Statistical extrapolations and radiomic models were noted to be highly variable. For instance, Hang et al [17] sought to correlate RFs of primary PDAC tumours and liver metastases to overall survival by incorporating four texture features into a radiomics score based on a statistical significance of p < 0.1, as opposed to the commonly employed threshold of p < 0.05. Furthermore, the authors failed to provide information on whether selected RFs were considered for or actually included in the multivariable model. Similarly, Attiyeh et al [22] developed two models to predict overall survival in resectable PDAC patients, incorporating additional characteristics such as serum CA19-9 and Brennan pathology scoring. However, the chosen CT contrast phase for RF extraction as well as RF selection was not disclosed for either model or univariate analysis. Two studies that aimed to use radiomics to assess tumour grading are limited by their methodologies and reporting [23, 24]. Chang et al [24] did not disclose details regarding the significant RF selection while Tikhonova et al [23] used a p-value of < 0.1 for statistical significance and assessed contrast enhancement changes in a very small volume of interest (< 1mm3).

The study by Chen et al [16] showed the potential utility of RFs in identifying and correlating suspected superior mesenteric vein and suspected portal vein invasion. Their model showed superior performance in comparison to two experienced radiologists. Despite the limitations (small cohort (n = 54) and moderate inter-reader variability (κ = 0.517)), this model exemplifies a robust, IBSI compliant methodology that warrants future validation.

Modelling techniques, such as the LASSO algorithm (least absolute shrinkage and selection operator), are useful in identifying significant variables and features of data. However, our analysis showed that many studies, if not all, may be underpowered, resulting in an inability to detect statistically significant clinically prognostic features. Despite acknowledging the limitation of small sample size, many studies suggest that validating the identified final model using an internal or external validation dataset is sufficient, as opposed to reproducing the model with a larger, adequately powered sample size to identify statistically significant features. The predictive performance of models is likely to benefit from an assessment of interactions between variables, but only the study by Hang et al with the largest sample size (n = 352) could feasibly allow for this exploration.

Study reviewed did not provide a link to the data, statistical analysis, or programming code used, which might be due to patient data confidentiality reasons. While the majority of studies were transparent in their methods, several omissions were noted which impede reproducibility.

Recruitment of large PDAC cohort, extracting robust RFs, and developing an imaging biomarker from a potential pool of thousands of RFs with such small sample sizes is challenging. The development of effective methodologies and early engagement of a multidisciplinary team, including more technical, non-clinical craft groups, such as biostatisticians and computer scientists, would greatly benefit research in this field.

Conclusion

There is a limited number of primary research publications of PDAC CT radiomics using IBSI compliant software. However, as advancements in methodology and standardisation of practice continue to develop, radiomics has the potential to serve as a valuable non-invasive biomarker in the management of pancreatic cancer.