Introduction

Epicardial adipose tissue (EAT) represents a metabolically active reserve of visceral fat located between the cardiac serosa of pericardium and the myocardium. It covers more than \(80\%\) of the cardiac surface, along the free wall of the right ventricle, the atrioventricular and interventricular grooves, surrounding the proximal segments of coronary arteries [1]. Several studies have shown that EAT performs various functions, such as mechanical support of coronary vessels, energy reserve due to its high free fatty acid content, and thermoregulatory function [2]. Coronary artery disease (CAD) is a leading cause of death and morbidity worldwide [3], and coronary CT angiography (CCTA) has gained clinical acceptance, playing nowadays a pivotal role in the CAD evaluation. As a non-invasive and cost-effective imaging tool, CCTA has great potential in reducing the global socioeconomic burden of CAD [4]. The detection of high-risk atherosclerotic plaque markers (i.e., low attenuation, positive remodeling, spotty calcification and the napkin-ring sign) in CCTA allows for highly specific labeling of patients at increased risk for major adverse cardiac events. These markers correlate with adverse outcomes predicting ischemia even in non-obstructive lesions [5]. Recent studies have focused on CT attenuation in epicardial and pericoronary adipose tissue as an indirect marker of coronary atherosclerosis and plaque inflammation [6]. Inflammation is a crucial component of atherosclerosis and a consistent pathologic feature of unstable atherosclerotic plaques. Increased CT attenuation in adipose tissue adjacent to an atherosclerotic plaque is thought to be a marker of inflammation [7].

Several imaging modalities have been developed to measure epicardial and pericoronary adipose fat, such as echocardiography, CT, and MRI. The use of CT, due to higher spatial resolution, provides a more accurate assessment of EAT [8]. Quantification of EAT has required, until recent years, complex manual measurements performed by personnel with high professional backgrounds and exploiting only a fraction of the available information. The development of accurate and reliable semi-automated software for the quantification of epicardial adipose tissue (i.e., quartile attenuation analysis) may provide more significant associations between fat characteristics and different clinical scenarios [9].

Quantification methods exploiting only “naked eye” visible characteristics reflect only a fraction of the available information, which ultimately leads to a rather crude variable with significant overlap between sick patients and healthy controls. In this scenario, radiomics greatly increases the quantitative information accessible from CT images. Hundreds of imaging features, which cannot be assessed by the human eye, are being extracted to create big data, from which imaging patterns associated with clinical features or outcomes can be derived [4]. Radiomics data may improve the diagnostic and predictive capabilities of CCTA, leading to better risk stratification for future events [10].

Although recent years have seen a proliferation of radiomic work aimed at supporting and improving diagnostic performance by means of deep learning techniques, limited attention has been paid to the development of reproducible and explainable studies. In fact, deep learning techniques, due to their “black-boxed” nature, are not suitable for implementing fully and explainable predictive models exploiting intrinsically interpretable features.

The need for explainable models and interpretable features brings us toward non-deep machine learning (shallow learning) approaches. Moreover, it must be highlighted that current AI-based solutions have some critical problems concerning their application within clinical scenarios. AI’s needs are mainly related to (i) big data with accurate annotations and complete information, (ii) continuous feed of real-world data, and (iii) close collaboration with clinicians in all stages of development. This leads to the identification of some main concerns for AI: (i) only a small fraction of the medical centers (data providers) are willing to share data; (ii) lack of curated dataset; (iii) inter/intra-rater variability. When working with datasets collected by clinical environments with limited amounts of enrolled patients, one possible solution to address these issues is to shallow learning approaches, which require much less data than deep learning architectures.

The explainability of the predictive model has become a fundamental requirement in clinical contexts. In fact, some mandatory aspects for clinicians and patients must be considered:

  • Clinician’s needs: once the model is considered valid by the developer, the clinician is able to confirm some important clinical evidence, leading the clinicians to trust these computerized systems and encourage their use in the clinical practice;

  • Patient’s needs: a local explanation of the model result for the single patient, exactly as a doctor explains the choice of a therapy or a diagnosis for a specific clinical case. In particular, the domain expert establishes whether a local explanation makes sense and can be considered valid.

To this purpose, after defining and setting predictive models, we tried to justify the presence of the features in the found signatures, supported by the physician team.

The aim of this work is to define some predictive models, based on both clinical and radiomic features, for CAD prediction. An in-depth analysis by means of several machine learning algorithms — unlike deep learning that can implement patterns that can be explained — and features selection methods were applied to select the best predictive signature. The main contributions of this study are:

  • A well-structured processing pipeline, according to the literature indications [11], enabling the definition of robust biomarkers;

  • The implementation of multimodal predictive models, based on both clinical and radiomic features, able to predict CAD;

  • To provide a trusted system supporting cognitive and decision-making processes [12] in the medical domain by means of machine learning algorithms and interpretable clinical and radiomic features.

To evaluate model performance, the following metrics were considered: accuracy, sensitivity, specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), and area under the curve (AUROC).

This manuscript is structured as follows: “Materials and Methods’’ describes the dataset and the features used, focusing in detail on each step of the implemented processing pipeline; “Experimental Results’’ reports the features preprocessing steps, the predictive model build-up, the signatures obtained by each model, and their classification performance; “Discussion’’ points out on some remarks concerning this study and its experimental findings; finally, “Conclusions’’ focuses on the performance obtained and on interesting future developments.

Materials and Methods

This section describes (i) the characteristics of the data used (with respect to both CT sequences and clinical data), (ii) the features used, and (iii) the processing pipeline implemented (with emphasis on the feature selection methods and machine learning classifiers employed to build the predictive models). In addition, the semi-automated method implemented for segmentation of pericoronaric adipose tissue is described, which, by means of interpolation, allows reducing the workload and the time required for Region of Interest (ROI) annotation. In Fig. 1 the flow diagram of the overall processing pipeline implemented in this study is depicted. Moreover, in Fig. 2 for each of the basic processing steps the “alternatives” (e.g., feature selection methods, machine learning classifiers) are shown.

Fig. 1
figure 1

Overall flow diagram depicting the whole processing pipeline implemented in this study

Fig. 2
figure 2

Processing alternatives of the crucial pipeline steps

Dataset Description

The dataset used in this study consists of 118 CCTA series collected from October 2019 to January 2020 at the Policlinico University Hospital ‘P. Giaccone’ of Palermo. The initial set composed of 135 cases was preliminarily evaluated by two radiologists with over 10 years of experience. Considering as criteria the image quality, 17 CCTA series with poor quality (i.e., low opacification of coronary arteries, motion artifacts) were discarded. The final dataset includes 84 men and 34 women with a mean age of \(60.33 \pm 13.2\), labeled as “without CAD” (40) and “with CAD” (78).

Clinical Features

The following clinical features were considered in this study: age, sex, body mass index (BMI), family history, smoking, diabetes, hypertension, cholesterol, obesity, current hypertension, statin treatment, peripheral vasculopathy, prior acute myocardial infarction (AMI).

Pericoronaric Adipose Tissue Segmentation

For the extraction of radiomic features, a Volume of Interest (VOI) containing pericoronary fat around the anterior interventricular artery (IVA) was considered. The choice of a very specific area (i.e., the IVA) is also justified by the need to ensure the reproducibility of the study.

To this aim, a semi-automatic computer-assisted tool was developed and implemented using the Matlab environment, which semi-automatically is able to detect a cylindrical region around the IVA in a few simple steps. In particular, first of all, the Volume of Interest (VOI) containing the IVA must be selected by drawing a rectangle in the slice where the IVA is most visible along its axis. This allows us to select an area with dimensions (xy), while along the z-axis, the greatest between x and y is chosen as dimension. Therefore, a parallelepiped of dimensions \((x, y, \text {max}(x, y))\) is located in the space, which constitutes our VOI around the IVA. After the VOI identification, every stepROI slices, the operator inserts a circular ROI centered on the IVA. Once the reference ROIs are manually drawn, the system automatically interpolates the ROIs onto the remaining slices included in the range of interest. By so doing, the number of slices manually drawn by the user is reduced by a stepROI factor. The interpolation approach is inspired by the method proposed in [9].

The VOI is the volume containing the segmentation mask (ROI). If the VOI is large enough to contain the IVA tract to be attended to (so that it does not leave out any part of interest), the choice of VOI does not affect IVA region segmentation. The only problem might be if the operator places the VOI in a completely different region than the one containing the IVA (unlikely in the case of experienced operators). Manual placement of circular ROIs could be a critical issue if this one is not centered with respect to the IVA. However, there is a need to consider that after the circular ROI is placed, its position can be refined and then the operator makes it definite. Before adopting this segmentation method, two different approaches were evaluated jointly with the clinicians: a fully manual approach, and the used one. From this analysis, it is inferred that the ROIs obtained by interpolation (taking as reference the manually entered ROIs) were essentially comparable (according to Dice index) to the manual ROIs. This evaluation — conducted as the stepROI parameter varied (\(stepROI \in {3,4,5,6,7,8}\)) — showed that for \(stepROI=5\) (a value that was then used in the implementation) allowed us to achieve the highest Dice index values. On the other hand, as with any semi-automated method, the user (i.e., physician) interaction is required to initiate the segmentation process. The advantage is that, in the face of minimal input, by means of interpolation this approach reduces by about 80% (it is required to enter one ROI for every 5 CCTA slices) the workload compared to a fully manual delineation approach.

Algorithm 1 outlines the pseudocode of the semi-automatic segmentation approach. Moreover, in Fig. 3 the initial step concerning VOI and ROI setting is shown, while in Fig. 4 the three views (a, b, c) and the 3D volume-rendering model (d) of the segmented pericoronaric adipose tissue are depicted.

figure a
Fig. 3
figure 3

In a selection of the VOI in the slice where the IVA is most visible. In b and c the ROIs inserted around the IVA in the initial and the final slices, respectively

Fig. 4
figure 4

In a, b and c the three views of the segmented adipose tissue around the IVA. In d the corresponding 3D volume-rendering reconstruction of the pericoronaric adipose tissue around the IVA

Radiomic Features Extraction

The extraction of the radiomic features was done by means of PyRadiomics [13]: a total of 93 features were extracted. The extraction was performed without any resampling to avoid interpolation artifacts. Radiomic features were extracted from the 3D ROIs delineated in the previous step. The following five feature categories were extracted and considered:

  • First Order (FO): describe the distribution of voxel intensities within the image region defined by the mask through commonly used and basic metrics;

  • Gray Level Co-occurrence Matrix (GLCM) [14, 15]: spatial relationship between pixels in a specific direction, highlighting property of uniformity, homogeneity, randomness and linear dependencies;

  • Gray Level Run Length Matrix (GLRLM) [16]: texture in specific direction, where fine texture has shorter runs while coarse texture presents more long runs with different intensity values;

  • Gray Level Size Zone Matrix (GLSZM) [17]: regional intensity variations or the distribution of homogeneity regions;

  • Gray Level Dependence Matrix (GLDM) [18]: quantifies gray level dependencies;

  • Neighboring Gray Tone Difference Matrix (NGTDM) [19]: spatial relationship among three or more pixels, closely approaching the human perception of image.

Considering that the ROI extracted around the IVA has the shape of a cylinder, shape-based features were not considered because they are not representative of the clinical problem we are trying to model.

Unbalanced Data Management

Classification of unbalanced data involves developing predictive models on datasets that have a wide class imbalance. In fact, working with imbalanced datasets can lead to have poor performance on the minority class. One approach to addressing imbalanced datasets is to oversample the minority class: new examples can be synthesized from the existing examples. This is a type of data augmentation for the minority class and is referred to as the Synthetic Minority Oversampling Technique (SMOTE) [20].

Radiomic Features Preprocessing and Statistical Analysis

Feature preprocessing is mandatory in order to define robust imaging biomarkers [11]. In particular, to obtain a subset of features with relevant information content and non-redundant, calibration and preprocessing were performed by the following steps:

  • Near-zero variance analysis: aimed at removing the features that do not convey information content. This operation considered a variance cutoff of 0.01. Features with a variance less than or equal to this threshold have been discarded;

  • Correlation analysis: aimed at removing highly correlated features for reducing the redundancy among the features. We used the Spearman correlation coefficient for pairwise feature comparison. In the case of a correlation value higher than 0.9, the feature with the highest predictive power was selected;

The Mann-Whitney U test and Fisher’s exact test were used to test the difference between the variable distributions. In particular, continue variables (i.e., radiomic features) were compared by the Mann-Whitney U test, while categorical variables (clinical data) by the Fisher’s exact test. A 2-sided p-value lower than 0.05 was considered as the threshold for statistical significance.

Features Selection Methods

The inclusion of many features makes the model more complex, and an increased chance of overfitting occurs during the classification. In fact, some features can be noisy and potentially damage the model. In the selection, both clinical and radiomic features were considered. The Feature Selection (FS) phase enables selecting the most discriminating features. Mainly, FS was used to define multimodal signatures — sets composed of radiomic and clinical features — to be used in the subsequent modeling phase. In addition, evaluations were made considering only clinical and only radiomic features (thus unimodal signatures), in order to quantify the improvement achieved by multimodal signatures. In our study the following three FS methods were considered:

  • L1-based: a linear model with L1 penalty eliminates some of the features, acting as a feature selection method before using another model to fit the data. In particular, it is assumed that a linear model penalized with L1 norm has sparse solutions. For this reason, a Linear Support vector classifier algorithm is trained, and only features with non-zero coefficients are selected;

  • Tree-based: is based on the training of a decision tree-based algorithm. In particular, tree-based estimators are used to compute impurity-based feature importances [21], which in turn can be used to discard irrelevant features (an importance threshold of 1e-5 was used to discard features);

  • Mutual information: use an entropy measure, called “mutual information,” to assess which features should be included in the reduced data [22]. With more details, mutual information is a measure that evaluates the dependence between two random variables, by quantifying the amount of information obtained about one variable by means of the other one.

Modeling Phase

The predictive modeling was performed by exploiting different machine learning algorithms (namely, SVM [23], Random Forest [24, 25], AdaBoost [25, 26] and XGBoost [27]) trained and tested using a nested 5-fold cross-validation (CV) scheme [28]. The above listed machine learning classifiers uses as input the features selected in previous step to yield binary (i.e., “with CAD” vs. “without CAD” cases) classification results. The use of the nested CV allowed to train a classification model where the hyperparameters also need to be optimized. In fact, nested CV estimates the generalization error of the underlying model and its hyperparameter search.

Within each fold, the inner loop allows us to find the best setting of hyperparameters to be tested in the retained test set. The nested CV finds the best (5) models, and the one with the best performance was chosen as the final predictive model. The evaluation metrics reported in the following “Experimental Results’’ are averaged values among all those obtained by the best model. Figure 5 depicts the nested 5-fold cross-validation approach adopted in our study.

Fig. 5
figure 5

Diagram depicting the nested 5-fold cross-validation approach used in this study

Experimental Results

The conducted experiments were aimed at quantifying the capabilities of the built predictive models in coronary artery disease characterization. In particular, this section reports details about (i) the features preprocessing, (ii) the predictive models (i.e., FS method + machine learning classifier) discovered and tested, (iii) the signatures obtained by each model (considering only clinical, only radiomic, and clinical + radiomic features), and finally (iv) the classification performance. To evaluate the models performance, accuracy, sensitivity, specificity, PPV, NPV, and AUROC were computed.

Features Preprocessing

To define robust imaging biomarkers [11], once features are computed, the steps described in “Radiomic Features Preprocessing and Statistical Analysis’’ were realized. The number of remaining features after each preprocessing step is reported in Table 1.

Table 1 Details concerning steps implemented in this study for calibration and preprocessing of radiomic and clinical features

Features Selection and Modeling

The FS phase allowed to select unimodal signatures — composed of only one type of feature (i.e., clinical, radiomic) — as well as multimodal signatures — composed of radiomic and clinical features — to be used as inputs to the machine learning algorithms (Table 2). First of all, a “discovery” phase — aimed at finding the best machine learning algorithm — was performed. In this discovery phase, only 10 repetitions of the nested 5-fold CV training were performed evaluating only the accuracy of the predictive models (Table 3). Successively, similarly to the discovery phase, 100 repetitions were performed in order to calculate all other statistically relevant metrics from the best predictive model founded in the discovery (Table 4). Figure 6 shows the ROC curves obtained by considering unimodal and multimodal signatures.

Table 2 Multimodal signatures obtained by the features selection methods considering clinical and radiomic features. Cells containing clinical and radiomic features are highlighted with orange and blue colors, respectively
Table 3 Accuracy values obtained in the modeling phase considering the machine learning algorithms and the feature selection methods used
Table 4 Performance obtained by the Random Forest model considering the 3 features selection methods
Fig. 6
figure 6

ROC curves obtained by the best predictive model (Random Forest + Mutual Information) considering unimodal (subfigures a and b) and multimodal signatures (subfigure c). With more details, a only clinical features; b only radiomic features, and c clinical and radiomic features. The thicker curve in blue represents the ROC curve averaged over the 100 repetitions of the CV. The thinner curves in light blue represent the ROCs of each single CV repetition. The gray transparent band around the ROC curve represents the standard deviation. Considering that the purpose of these figures is to show the average trend, and that reporting 100 ROC curves would have made the graph difficult to understand, we decided to plot only half (50/100) ROCs obtained in the repetitions

Model Explainability

The concept of explainability goes beyond finding a signature (i.e., a set of biomarkers) able to predict the clinical outcome (e.g., diagnosis, prognosis, treatment response). In the following, we attempt to explain the link between feature trends (i.e., high/low values) — in the case of radiomic features directly correlated with image morphology — and prediction. In particular, we (i) quantified the contribution of the features to the final model decision, and (ii) justified clinically why the found features are discriminant for the problem under investigation.

This analysis was focused on the best machine learning classifier (i.e., Random Forest). A Random Forest classifier is a Tree Ensemble algorithm, and it was possible to calculate the importance of features. In particular, it was computed the accumulation of the impurity decrease (MDI) within each Decision Tree composing the Forest. The mean and standard deviation of the accumulation were calculated by considering the best model (in terms of accuracy) obtained in each of the 100 repetitions of the nested CV.

According to the MDI analysis, both clinical and radiomic features contribute to the prediction but with different weights. As depicted in Fig. 7, age was found to be — along with Total Energy, Gray Level Variance (GLV), and Gray Level Non-Uniformity Normalized (GLNN) — one of the most discriminative features. It should be noted that age could be considered a confounding factor for classifiers, and that is why it is sometimes decoupled from the rest of the features. In our study, we wanted to consider age in the same way as the other features, because it is an important factor from a clinical point of view, although it is not sufficient — by itself — to have accurate diagnoses. In fact, as stated in the literature, epicardial fat characteristics are able to support CAD prediction [29]. In addition, as evidenced by the weights obtained from the MDI analysis, the other clinical features are not highly relevant and there is a need for radiomic features. For each of the most discriminative radiomic features, additional details regarding the weights obtained and their mathematical definitions are provided in what follows:

  • Total Energy (weight=0.176) is derived from the Energy (weight=0.096), which is a measure of the magnitude of voxel values in an image. In particular, TotalEnergy is the value of Energy feature scaled by the voxel volume. Larger values imply a greater sum of the squares of these values.

    $$\begin{aligned} \textit{Total Energy} = V_{voxel}\displaystyle \sum ^{N_p}_{i=1}{({\textbf {X}}(i) + c)^2} \end{aligned}$$
    (1)

    where:

    • \(V_{voxel}\) is the volume of the voxel;

    • X is a set of \(N_p\) voxels included in the ROI;

    • c is an optional value, which shifts the intensities to prevent negative values in X. This ensures that voxels with the lowest gray values contribute the least to Energy, instead of voxels with gray level intensity closest to 0.

  • Gray Level Variance (GLV) (weight=0.102) measures the variance in gray level intensities for the zones.

    $$\begin{aligned} \textit{GLV} = \displaystyle \sum ^{N_g}_{i=1}\displaystyle \sum ^{N_s}_{j=1}{p(i,j)(i - \mu )^2} \end{aligned}$$
    (2)
  • Gray Level Non-Uniformity Normalized (GLNN) (weight=0.098) measures the variability of gray-level intensity values in the image, with a lower value indicating a greater similarity in intensity values. This is the normalized version of the Gray Level Non-Uniformity (GLN), which measures the variability of gray-level intensity values in the image, with a lower value indicating more homogeneity in intensity values.

    $$\begin{aligned} \textit{GLNN} = \frac{\sum ^{N_g}_{i=1}\left( \sum ^{N_s}_{j=1}{{\textbf {P}}(i,j)}\right) ^2}{N_z^2} \end{aligned}$$
    (3)

    where:

    • \(p(i,j) = \frac{{\textbf {P}}(i,j)}{N_z}\) is the normalized size zone matrix;

    • P(i,j) is the size zone matrix;

    • \(\mu = \displaystyle \sum ^{N_g}_{i=1}\displaystyle \sum ^{N_s}_{j=1}{p(i,j)i}\)

    • \(N_g\) is the number of discrete intensity values in the image;

    • \(N_s\) is the number of discrete zone sizes in the image;

    • \(N_z\) is the number of zones in the ROI, which is equal to \(\sum ^{N_g}_{i=1}\sum ^{N_s}_{j=1} {{\textbf {P}}(i,j)}\) and \(1 \le N_z \le N_p\);

    • \(N_p\) is the number of voxels in the image;

Fig. 7
figure 7

Feature weights (importance) of the signatures composed of a only clinical features, b only radiomic features, and c clinical and radiomic features

Discussion

This work demonstrated that combinations of radiomic features represent valid biomarkers to assess the diagnosis in patients using a CAD system. We set up a comprehensive study, from the point of view of the feature selection methods (i.e., L1-based, tree-based, mutual information), as well as considering several machine learning classifiers (namely, SVM, Random Forest, AdaBoost, XGBoost) suitable for small-size datasets.

The experimental findings showed a clear performance improvement when multimodal signatures — composed of clinical and radiomic features — are used. In fact, the best predictive model (i.e., mutual information and Random Forest) obtained AUROC=\(0.820 \pm 0.076\), while the worst unimodal model (exploiting only clinical risk factors) gets only AUROC=\(0.666 \pm 0.081\). As a matter of fact, the multimodal model obtained an evident improvement of about \(23\%\) (\(\Delta _{AUROC}=0.154\)). As can be seen, just using radiomic features alone results in improved performance (AUROC=\(0.803 \pm\)0.076) compared to clinical features alone.

To clinically justify the results, we used the MDI analysis to evaluate features importance in the prediction. This allowed us to explain our experimental findings. According to the MDI weights, the most important features are:

  • Age (weight=0.131). High values of age lead toward “with CAD” class, while low values of age lead toward “without CAD” class. Considering that older patients have a higher chance of evolving to CAD, this result is quite intuitive.

  • Total Energy (weight=0.176) is derived from the Energy (weight=0.096), which is a measure of the magnitude of voxel values in an image. In particular, Total Energy is the value of Energy feature scaled by the voxel volume. A larger value implies a greater sum of the squares of these values. High values of TotalEnergy lead toward “without CAD” class, while low TotalEnergy values lead toward “with CAD” class. Considering that extracted pericoronary fat has ranges [-175, -15], this means that more negative values contribute more to TotalEnergy. As a matter of fact, more negative fat (in terms of HU) is indicative of a more stable clinical condition of coronary arteries [30].

  • Gray Level Variance (GLV) (weight=0.102) and Gray Level Non-Uniformity Normalized (GLNN) (weight=0.098) are both correlated with the variability of gray-level intensity values in the image, with a lower value indicating more homogeneity in intensity values. High GLV and GLNN values lead toward “with CAD” class, while low values lead toward “without CAD” class. This behavior seems to be aligned with literature [31, 32], as greater inhomogeneity can be associated with greater loco-regional pericoronary inflammation.

In order to provide the reader with a general overview of the results obtained by literature works that tackled a similar problem, a comparison has been made and the following Table 5 has been added.

In all the works considered [33,34,35,36] an extended set of radiomic features has been used considering both basic features (often called “originals”) and those obtained by convolution with “LoG” and “Wavelets” kernels). This makes it easy to reach a thousand features. A well-established, practical rule of thumb states that at least 5–10 samples (i.e., patients) would be needed for each feature in a model based on binary classifiers [37, 38]. Thus, increasing the number of features — especially, when an insufficient number of samples is available — might introduce further redundancy among the features and lead to the curse of dimensionality problem.

In machine learning, the curse of dimensionality is used interchangeably with the peaking phenomenon, which is also known as Hughes phenomenon [39]. This phenomenon states that with a fixed number of training samples, the average (expected) predictive power of a classifier first increases as the number of dimensions or features used is increased. This is due to the model’s overfitting on high-dimensionality and redundant data, which, on the one hand, leads to improved performance on test data but, on the other hand, reduces the generalization capabilities on new data.

Table 5 Literature comparison

Conclusions

This study was aimed at developing multimodal models able to predict CAD disease. The joint use of clinical and radiomic features allowed us to improve prediction over clinical data alone. A processing pipeline structured according to the literature indications [11] was implemented to extract robust biomarkers and obtain effective predictive models.

The explainability of the model, which is crucial in a clinical scenario, was one of our goals, and the methodological choices made during the development took into account this aspect: the use of machine learning algorithms and intrinsically interpretable clinical and radiomic features, enabled the introspection of the models, allowing the physician team to clinically justify the findings. These choices made it possible to implement a trusted system supporting cognitive and decision-making processes in the medical domain.

The performance of the predictive models is promising, especially considering that — unlike traditional approaches that use only clinical features — radiomic features allowed us to achieve a consistent improvement in classification rates. The main limitation of our work concerns the amount of data available. Our dataset of 118 samples is not large (although many papers in the literature present a much smaller sample size). For this reason, we will collect further data from our hospital partners in order to prospectively validate our findings. The proposed models, properly expanded and applied to different coronary segments, may enable the detection of new radiomic biomarkers, reflecting anti-inflammatory or pro-inflammatory processes. With these new tools, it would be even easier to identify timely patients at increased cardiovascular risk and thus enable personalized pharmacological intervention or lifestyle correction.

To conclude, the relevance of this work to the cognitive computation community is twofold: (i) developing a trusted predictive system to support cognitive and decision-making and control processes in the clinical practice [12]; (ii) leveraging explainable machine learning models and interpretable (clinical and radiomic) features [40, 41] for improved decision-process satisfaction and decision-advice transparency [42].