CT Radiomic Features and Clinical Biomarkers for Predicting Coronary Artery Disease

This study was aimed to investigate the predictive value of the radiomics features extracted from pericoronaric adipose tissue — around the anterior interventricular artery (IVA) — to assess the condition of coronary arteries compared with the use of clinical characteristics alone (i.e., risk factors). Clinical and radiomic data of 118 patients were retrospectively analyzed. In total, 93 radiomics features were extracted for each ROI around the IVA, and 13 clinical features were used to build different machine learning models finalized to predict the impairment (or otherwise) of coronary arteries. Pericoronaric radiomic features improved prediction above the use of risk factors alone. In fact, with the best model (Random Forest + Mutual Information) the AUROC reached 0.820±0.076\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.820 \pm 0.076$$\end{document}. As a matter of fact, the combined use of both types of features (i.e., radiomic and clinical) allows for improved performance regardless of the feature selection method used. Experimental findings demonstrated that the use of radiomic features alone achieves better performance than the use of clinical features alone, while the combined use of both clinical and radiomic biomarkers further improves the predictive ability of the models. The main contribution of this work concerns: (i) the implementation of multimodal predictive models, based on both clinical and radiomic features, and (ii) a trusted system to support clinical decision-making processes by means of explainable classifiers and interpretable features.


Introduction
Epicardial adipose tissue (EAT) represents a metabolically active reserve of visceral fat located between the cardiac serosa of pericardium and the myocardium. It covers more than 80% of the cardiac surface, along the free wall of the right ventricle, the atrioventricular and interventricular grooves, surrounding the proximal segments of coronary arteries [1]. Several studies have shown that EAT performs various functions, such as mechanical support of coronary vessels, energy reserve due to its high free fatty acid content, and thermoregulatory function [2]. Coronary artery disease (CAD) is a leading cause of death and morbidity worldwide [3], and coronary CT angiography (CCTA) has gained clinical acceptance, playing nowadays a pivotal role in the CAD evaluation. As a non-invasive and cost-effective imaging tool, CCTA has great potential in reducing the global socioeconomic burden of CAD [4]. The detection of high-risk atherosclerotic plaque markers (i.e., low attenuation, positive remodeling, spotty calcification and the napkin-ring sign) in CCTA allows for highly specific labeling of patients at increased risk for major adverse cardiac events. These markers correlate with adverse outcomes predicting ischemia even in non-obstructive lesions [5]. Recent studies have focused on CT attenuation in epicardial and pericoronary adipose tissue as an indirect marker of coronary atherosclerosis and plaque inflammation [6]. Inflammation is a crucial component of atherosclerosis and a consistent pathologic feature of unstable atherosclerotic plaques. Increased CT attenuation in adipose tissue adjacent to an atherosclerotic plaque is thought to be a marker of inflammation [7].
Several imaging modalities have been developed to measure epicardial and pericoronary adipose fat, such as echocardiography, CT, and MRI. The use of CT, due to higher spatial resolution, provides a more accurate assessment of EAT [8]. Quantification of EAT has required, until recent years, complex manual measurements performed by personnel with high professional backgrounds and exploiting only a fraction of the available information. The development of accurate and reliable semi-automated software for the quantification of epicardial adipose tissue (i.e., quartile attenuation analysis) may provide more significant associations between fat characteristics and different clinical scenarios [9].
Quantification methods exploiting only "naked eye" visible characteristics reflect only a fraction of the available information, which ultimately leads to a rather crude variable with significant overlap between sick patients and healthy controls. In this scenario, radiomics greatly increases the quantitative information accessible from CT images. Hundreds of imaging features, which cannot be assessed by the human eye, are being extracted to create big data, from which imaging patterns associated with clinical features or outcomes can be derived [4]. Radiomics data may improve the diagnostic and predictive capabilities of CCTA, leading to better risk stratification for future events [10].
Although recent years have seen a proliferation of radiomic work aimed at supporting and improving diagnostic performance by means of deep learning techniques, limited attention has been paid to the development of reproducible and explainable studies. In fact, deep learning techniques, due to their "black-boxed" nature, are not suitable for implementing fully and explainable predictive models exploiting intrinsically interpretable features.
The need for explainable models and interpretable features brings us toward non-deep machine learning (shallow learning) approaches. Moreover, it must be highlighted that current AI-based solutions have some critical problems concerning their application within clinical scenarios. AI's needs are mainly related to (i) big data with accurate annotations and complete information, (ii) continuous feed of realworld data, and (iii) close collaboration with clinicians in all stages of development. This leads to the identification of some main concerns for AI: (i) only a small fraction of the medical centers (data providers) are willing to share data; (ii) lack of curated dataset; (iii) inter/intra-rater variability. When working with datasets collected by clinical environments with limited amounts of enrolled patients, one possible solution to address these issues is to shallow learning approaches, which require much less data than deep learning architectures.
The explainability of the predictive model has become a fundamental requirement in clinical contexts. In fact, some mandatory aspects for clinicians and patients must be considered: • Clinician's needs: once the model is considered valid by the developer, the clinician is able to confirm some important clinical evidence, leading the clinicians to trust these computerized systems and encourage their use in the clinical practice; • Patient's needs: a local explanation of the model result for the single patient, exactly as a doctor explains the choice of a therapy or a diagnosis for a specific clinical case. In particular, the domain expert establishes whether a local explanation makes sense and can be considered valid.
To this purpose, after defining and setting predictive models, we tried to justify the presence of the features in the found signatures, supported by the physician team. The aim of this work is to define some predictive models, based on both clinical and radiomic features, for CAD prediction. An in-depth analysis by means of several machine learning algorithms -unlike deep learning that can implement patterns that can be explained -and features selection methods were applied to select the best predictive signature. The main contributions of this study are: • A well-structured processing pipeline, according to the literature indications [11], enabling the definition of robust biomarkers; • The implementation of multimodal predictive models, based on both clinical and radiomic features, able to predict CAD; • To provide a trusted system supporting cognitive and decision-making processes [12] in the medical domain by means of machine learning algorithms and interpretable clinical and radiomic features.
To evaluate model performance, the following metrics were considered: accuracy, sensitivity, specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), and area under the curve (AUROC). This manuscript is structured as follows: "Materials and Methods'' describes the dataset and the features used, focusing in detail on each step of the implemented processing pipeline; "Experimental Results'' reports the features preprocessing steps, the predictive model build-up, the signatures obtained by each model, and their classification performance; "Discussion'' points out on some remarks concerning this study and its experimental findings; finally, "Conclusions'' focuses on the performance obtained and on interesting future developments.

Materials and Methods
This section describes (i) the characteristics of the data used (with respect to both CT sequences and clinical data), (ii) the features used, and (iii) the processing pipeline implemented (with emphasis on the feature selection methods and machine learning classifiers employed to build the predictive models). In addition, the semi-automated method implemented for segmentation of pericoronaric adipose tissue is described, which, by means of interpolation, allows reducing the workload and the time required for Region of Interest (ROI) annotation. In Fig. 1 the flow diagram of the overall processing pipeline implemented in this study is depicted. Moreover, in Fig. 2 for each of the basic processing steps the "alternatives" (e.g., feature selection methods, machine learning classifiers) are shown.

Dataset Description
The dataset used in this study consists of 118 CCTA series collected from October 2019 to January 2020 at the Policlinico University Hospital 'P. Giaccone' of Palermo. The initial set composed of 135 cases was preliminarily evaluated by two radiologists with over 10 years of experience. Considering as criteria the image quality, 17 CCTA series with poor quality (i.e., low opacification of coronary arteries, motion artifacts) were discarded. The final dataset includes 84 men and 34 women with a mean age of 60.33 ± 13.2 , labeled as "without CAD" (40) and "with CAD" (78).

Clinical Features
The following clinical features were considered in this study: age, sex, body mass index (BMI), family history, smoking, diabetes, hypertension, cholesterol, obesity, current hypertension, statin treatment, peripheral vasculopathy, prior acute myocardial infarction (AMI).

Pericoronaric Adipose Tissue Segmentation
For the extraction of radiomic features, a Volume of Interest (VOI) containing pericoronary fat around the anterior interventricular artery (IVA) was considered. The choice of a very specific area (i.e., the IVA) is also justified by the need to ensure the reproducibility of the study.
To this aim, a semi-automatic computer-assisted tool was developed and implemented using the Matlab environment, which semi-automatically is able to detect a cylindrical region around the IVA in a few simple steps. In particular, first of all, the Volume of Interest (VOI) containing the IVA must be selected by drawing a rectangle in the slice where the IVA is most visible along its axis. This allows us to select an area with dimensions (x, y), while along the z-axis, the greatest between x and y is chosen as dimension. Therefore, a parallelepiped of dimensions (x, y, max(x, y)) is located in the space, which constitutes our VOI around the IVA. After the VOI identification, every stepROI slices, the operator inserts a circular ROI centered on the IVA. Once the reference ROIs are manually drawn, the system automatically interpolates the ROIs onto the remaining slices included in the range of interest. By so doing, the number of slices manually drawn by the user is reduced by a stepROI factor. The interpolation approach is inspired by the method proposed in [9].
The VOI is the volume containing the segmentation mask (ROI). If the VOI is large enough to contain the IVA tract to be attended to (so that it does not leave out any part of interest), the choice of VOI does not affect IVA region segmentation. The only problem might be if the operator places the VOI in a completely different region than the one containing the IVA (unlikely in the case of experienced operators). Manual placement of circular ROIs could be a critical issue if this one is not centered with respect to the IVA. However, there is a need to consider that after the circular ROI is placed, its position can be refined and then the operator makes it definite. Before adopting this segmentation method, two different approaches were evaluated jointly with the clinicians: a fully manual approach, and the used one. From this analysis, it is inferred that the ROIs obtained by interpolation (taking as reference the manually entered ROIs) were essentially comparable (according to Dice index) to the manual ROIs. This evaluation -conducted as the stepROI parameter varied ( stepROI ∈ 3, 4, 5, 6, 7, 8 ) -showed that for stepROI = 5 (a value that was then used in the implementation) allowed us to achieve the highest Dice index values. On the other hand, as with any semi-automated method, the user (i.e., physician) interaction is required to initiate the segmentation process. The advantage is that, in the face of minimal input, by means of interpolation this approach reduces by about 80% (it is required to enter one ROI for every 5 CCTA slices) the workload compared to a fully manual delineation approach.
Algorithm 1 outlines the pseudocode of the semi-automatic segmentation approach. Moreover, in Fig. 3 the initial step concerning VOI and ROI setting is shown, while in Fig. 4 the three views (a, b, c) and the 3D volume-rendering model (d) of the segmented pericoronaric adipose tissue are depicted.

Radiomic Features Extraction
The extraction of the radiomic features was done by means of PyRadiomics [13]: a total of 93 features were extracted. The extraction was performed without any resampling to avoid interpolation artifacts. Radiomic features were extracted from the 3D ROIs delineated in the previous step. The following five feature categories were extracted and considered: • First Order (FO): describe the distribution of voxel intensities within the image region defined by the mask through commonly used and basic metrics; • Gray Level Co-occurrence Matrix (GLCM) [14,15]: spatial relationship between pixels in a specific direction, highlighting property of uniformity, homogeneity, randomness and linear dependencies; • Gray Level Run Length Matrix (GLRLM) [16]: texture in specific direction, where fine texture has shorter runs while coarse texture presents more long runs with different intensity values; • Gray Level Size Zone Matrix (GLSZM) [17]: regional intensity variations or the distribution of homogeneity regions; • Gray Level Dependence Matrix (GLDM) [18]: quantifies gray level dependencies; • Neighboring Gray Tone Difference Matrix (NGTDM) [19]: spatial relationship among three or more pixels, closely approaching the human perception of image.
Considering that the ROI extracted around the IVA has the shape of a cylinder, shape-based features were not considered because they are not representative of the clinical problem we are trying to model.

Unbalanced Data Management
Classification of unbalanced data involves developing predictive models on datasets that have a wide class imbalance. In fact, working with imbalanced datasets can lead to have poor performance on the minority class. One approach to addressing imbalanced datasets is to oversample the minority class: new examples can be synthesized from the existing examples. This is a type of data augmentation for the minority class and is referred to as the Synthetic Minority Oversampling Technique (SMOTE) [20].

Radiomic Features Preprocessing and Statistical Analysis
Feature preprocessing is mandatory in order to define robust imaging biomarkers [11]. In particular, to obtain a subset of features with relevant information content and non-redundant, calibration and preprocessing were performed by the following steps: • Near-zero variance analysis: aimed at removing the features that do not convey information content. This operation considered a variance cutoff of 0.01. Features with a variance less than or equal to this threshold have been discarded; • Correlation analysis: aimed at removing highly correlated features for reducing the redundancy among the features. We used the Spearman correlation coefficient for pairwise feature comparison. In the case of a correlation value higher than 0.9, the feature with the highest predictive power was selected; The

Features Selection Methods
The inclusion of many features makes the model more complex, and an increased chance of overfitting occurs during the classification. In fact, some features can be noisy and potentially damage the model. In the selection, both clinical and radiomic features were considered. The Feature Selection (FS) phase enables selecting the most discriminating features. Mainly, FS was used to define multimodal signatures -sets composed of radiomic and clinical features -to be used in the subsequent modeling phase. In addition, evaluations were made considering only clinical and only radiomic features (thus unimodal signatures), in order to quantify the improvement achieved by multimodal signatures. In our study the following three FS methods were considered: • L1-based: a linear model with L1 penalty eliminates some of the features, acting as a feature selection method before using another model to fit the data. In particular, it is assumed that a linear model penalized with L1 norm has sparse solutions. For this reason, a Linear Support vector classifier algorithm is trained, and only features with non-zero coefficients are selected; • Tree-based: is based on the training of a decision treebased algorithm. In particular, tree-based estimators are used to compute impurity-based feature importances [21], which in turn can be used to discard irrelevant features (an importance threshold of 1e-5 was used to discard features); • Mutual information: use an entropy measure, called "mutual information," to assess which features should be included in the reduced data [22]. With more details, mutual information is a measure that evaluates the dependence between two random variables, by quantifying the amount of information obtained about one variable by means of the other one.

Modeling Phase
The predictive modeling was performed by exploiting different machine learning algorithms (namely, SVM [23], Random Forest [24,25], AdaBoost [25,26] and XGBoost [27]) trained and tested using a nested 5-fold cross-validation (CV) scheme [28]. The above listed machine learning classifiers uses as input the features selected in previous step to yield binary (i.e., "with CAD" vs. "without CAD" cases) classification results. The use of the nested CV allowed to train a classification model where the hyperparameters also need to be optimized. In fact, nested CV estimates the generalization error of the underlying model and its hyperparameter search. Within each fold, the inner loop allows us to find the best setting of hyperparameters to be tested in the retained test set. The nested CV finds the best (5) models, and the one with the best performance was chosen as the final predictive model. The evaluation metrics reported in the following "Experimental Results'' are averaged values among all those obtained by the best model. Figure 5 depicts the nested 5-fold cross-validation approach adopted in our study.

Experimental Results
The conducted experiments were aimed at quantifying the capabilities of the built predictive models in coronary artery disease characterization. In particular, this section reports details about (i) the features preprocessing, (ii) the predictive models (i.e., FS method + machine learning classifier) discovered and tested, (iii) the signatures obtained by each model (considering only clinical, only radiomic, and clinical + radiomic features), and finally (iv) the classification performance. To evaluate the models performance, accuracy, sensitivity, specificity, PPV, NPV, and AUROC were computed.

Features Preprocessing
To define robust imaging biomarkers [11], once features are computed, the steps described in "Radiomic Features Preprocessing and Statistical Analysis'' were realized. The number of remaining features after each preprocessing step is reported in Table 1.

Features Selection and Modeling
The FS phase allowed to select unimodal signatures -composed of only one type of feature (i.e., clinical, radiomic)as well as multimodal signatures -composed of radiomic and clinical features -to be used as inputs to the machine learning algorithms ( Table 2). First of all, a "discovery" phase -aimed at finding the best machine learning algorithmwas performed. In this discovery phase, only 10 repetitions of the nested 5-fold CV training were performed evaluating only the accuracy of the predictive models (Table 3). Successively, similarly to the discovery phase, 100 repetitions were performed in order to calculate all other statistically relevant metrics from the best predictive model founded in the discovery (Table 4). Figure 6 shows the ROC curves obtained by considering unimodal and multimodal signatures.

Model Explainability
The concept of explainability goes beyond finding a signature (i.e., a set of biomarkers) able to predict the clinical outcome (e.g., diagnosis, prognosis, treatment response). In the following, we attempt to explain the link between feature trends (i.e., high/low values) -in the case of radiomic features directly correlated with image morphology -and prediction. In particular, we (i) quantified the contribution of the features to the final model decision, and (ii) justified clinically why the found features are discriminant for the problem under investigation.
This analysis was focused on the best machine learning classifier (i.e., Random Forest). A Random Forest classifier is a Tree Ensemble algorithm, and it was possible to calculate the importance of features. In particular, it was computed the accumulation of the impurity decrease (MDI) within each Decision Tree composing the Forest. The mean and standard deviation of the accumulation were calculated by considering the best model (in terms of accuracy) obtained in each of the 100 repetitions of the nested CV.
According to the MDI analysis, both clinical and radiomic features contribute to the prediction but with different weights. As depicted in Fig. 7, age was found to be -along with Total Energy, Gray Level Variance (GLV), and Gray Level Non-Uniformity Normalized (GLNN) -one of the most discriminative features. It should be noted that age could be considered a confounding factor for classifiers, and that is why it is sometimes decoupled from the rest of the features. In our study, we wanted to consider age in the same way as the other features, because it is an important factor from a clinical point of view, although it is not sufficient -by itself -to have accurate diagnoses. In fact, as stated in the literature, epicardial fat characteristics are able to support CAD prediction [29]. In addition, as evidenced by the weights obtained from the MDI analysis, the other  where: -V voxel is the volume of the voxel; (1) -X is a set of N p voxels included in the ROI; c is an optional value, which shifts the intensities to prevent negative values in X. This ensures that voxels with the lowest gray values contribute the least to Energy, instead of voxels with gray level intensity closest to 0. where: p(i, j) = P(i,j) N z is the normalized size zone matrix; -P(i,j) is the size zone matrix; -N g is the number of discrete intensity values in the image; -N s is the number of discrete zone sizes in the image; -N z is the number of zones in the ROI, which is equal to ∑N g i=1 ∑ N s j=1 P(i, j) and 1 ≤ N z ≤ N p ; -N p is the number of voxels in the image;

Discussion
This work demonstrated that combinations of radiomic features represent valid biomarkers to assess the diagnosis in patients using a CAD system. We set up a comprehensive study, from the point of view of the feature selection methods (i.e., L1-based, tree-based, mutual information), as well as considering several machine learning classifiers (namely, SVM, Random Forest, AdaBoost, XGBoost) suitable for small-size datasets.
The experimental findings showed a clear performance improvement when multimodal signatures -composed of clinical and radiomic features -are used. In fact, the best predictive model (i.e., mutual information and Random Forest) obtained AUROC=0.820 ± 0.076 , while the worst unimodal model (exploiting only clinical risk factors) gets only AUROC=0.666 ± 0.081 . As a matter of fact, the multimodal model obtained an evident improvement of about 23% ( Δ AUROC = 0.154 ). As can be seen, just using   "without CAD" class. This behavior seems to be aligned with literature [31,32], as greater inhomogeneity can be associated with greater loco-regional pericoronary inflammation.
In order to provide the reader with a general overview of the results obtained by literature works that tackled a similar problem, a comparison has been made and the following Table 5 has been added. In all the works considered [33][34][35][36] an extended set of radiomic features has been used considering both basic features (often called "originals") and those obtained by convolution with "LoG" and "Wavelets" kernels). This makes it easy to reach a thousand features. A well-established, practical rule of thumb states that at least 5-10 samples (i.e., patients) would be needed for each feature in a model based on binary classifiers [37,38]. Thus, increasing the number of features -especially, when an insufficient number of samples is available -might introduce further redundancy among the features and lead to the curse of dimensionality problem.
In machine learning, the curse of dimensionality is used interchangeably with the peaking phenomenon, which is also known as Hughes phenomenon [39]. This phenomenon states that with a fixed number of training samples, the average (expected) predictive power of a classifier first increases as the number of dimensions or features used is increased. This is due to the model's overfitting on highdimensionality and redundant data, which, on the one hand, leads to improved performance on test data but, on the other hand, reduces the generalization capabilities on new data.

Conclusions
This study was aimed at developing multimodal models able to predict CAD disease. The joint use of clinical and radiomic features allowed us to improve prediction over clinical data alone. A processing pipeline structured according to the literature indications [11] was implemented to extract robust biomarkers and obtain effective predictive models.
The explainability of the model, which is crucial in a clinical scenario, was one of our goals, and the methodological choices made during the development took into account this aspect: the use of machine learning algorithms and intrinsically interpretable clinical and radiomic features, enabled the introspection of the models, allowing the physician team to clinically justify the findings. These choices made it possible to implement a trusted system supporting cognitive and decision-making processes in the medical domain.
The performance of the predictive models is promising, especially considering that -unlike traditional approaches that use only clinical features -radiomic features allowed us to achieve a consistent improvement in classification rates. The main limitation of our work concerns the amount of data available. Our dataset of 118 samples is not large (although many papers in the literature present a much smaller sample size). For this reason, we will collect further data from our hospital partners in order to prospectively validate our findings. The proposed models, properly expanded and applied to different coronary segments, may enable the detection of new radiomic biomarkers, reflecting anti-inflammatory or pro-inflammatory processes. With these new tools, it would be even easier to identify timely patients at increased cardiovascular risk and thus enable personalized pharmacological intervention or lifestyle correction.
To conclude, the relevance of this work to the cognitive computation community is twofold: (i) developing a trusted predictive system to support cognitive and decision-making and control processes in the clinical practice [12]; (ii) leveraging explainable machine learning models and interpretable (clinical and radiomic) features [40,41] for improved decisionprocess satisfaction and decision-advice transparency [42].
Funding Open access funding provided by Consiglio Nazionale Delle Ricerche (CNR) within the CRUI-CARE Agreement. This study was not funded by any organization or agency.
Data Availability Data will be made available on reasonable request.

Declarations
Ethical Approval Retrospective data collection was approved by the local Ethics Committee. The requirement for evidence of informed consent was waived because of the retrospective nature of our study.

Conflict of Interest
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.