Explaining decisions of a light-weight deep neural network for real-time coronary artery disease classification in magnetic resonance imaging

In certain healthcare settings, such as emergency or critical care units, where quick and accurate real-time analysis and decision-making are required, the healthcare system can leverage the power of artificial intelligence (AI) models to support decision-making and prevent complications. This paper investigates the optimization of healthcare AI models based on time complexity, hyper-parameter tuning, and XAI for a classification task. The paper highlights the significance of a lightweight convolutional neural network (CNN) for analysing and classifying Magnetic Resonance Imaging (MRI) in real-time and is compared with CNN-RandomForest (CNN-RF). The role of hyper-parameter is also examined in finding optimal configurations that enhance the model’s performance while efficiently utilizing the limited computational resources. Finally, the benefits of incorporating the XAI technique (e.g. GradCAM and Layer-wise Relevance Propagation) in providing transparency and interpretable explanations of AI model predictions, fostering trust, and error/bias detection are explored. Our inference time on a MacBook laptop for 323 test images of size 100x100 is only 2.6 sec, which is merely 8 milliseconds per image while providing comparable classification accuracy with the ensemble model of CNN-RF classifiers. Using the proposed model, clinicians/cardiologists can achieve accurate and reliable results while ensuring patients’ safety and answering questions imposed by the General Data Protection Regulation (GDPR). The proposed investigative study will advance the understanding and acceptance of AI systems in connected healthcare settings.


Introduction
According to the World Health Organization1 , in 2019, an estimated 17.9 million people died from cardiovascular diseases, representing 32% of all global deaths.Statistics published by the American Heart Association in 2023 state that from 2017-2020, an estimated 20.5 million Americans had coronary heart disease (CHD) [1].Specifically, Coronary artery disease (CAD) accounts for approximately 610,000 deaths annually in the United States and is the third leading cause of death worldwide, with 17.8 million deaths annually [2].
The patient's symptoms of CAD are neither sensitive nor specific, thus making it difficult for clinicians or cardiologists to rely only on them.The reference standard for CAD detection is coronary angiography, which is an invasive diagnostic imaging procedure performed using cardiac catheterization [3].This method is expensive and carries potential risks.Other methods include cardiac imaging techniques, which are safe, non-invasive, cheaper and can help doctors in early detection and providing timely interventions to treat CAD patients.These techniques include X-rays, Computer Tomography (CT), Echo-cardiogram and Magnetic Resonance Imaging (MRI) or Cardiac Magnetic Resonance (CMR) Imaging [4].
X-rays and CT imaging techniques use ionizing radiations, which are considered harmful if a patient is overexposed to them [5].Echocardiograms are limited by cost, time, and acoustic window access [6].MRI or CMR imaging uses magnetic waves and is considered a viable alternative for non-invasive assessment of CAD [7].MRI/CMR images provide precise measurements of heart structure and functions, as well as myocardial perfusion and parametric quantification.MRI/CMR could be 2D or 3D, but 3D imaging has excessive artifacts and has thus not been clinically used for the diagnosis of CAD [8].Manual interpretation of 2D scans is also time-consuming and requires experience.Thus, artificial intelligence methods are exploited to automate the CAD diagnosis to reduce the analysis time with potentially improved accuracy.This plays a critical role in connected healthcare settings (transitioning healthcare services remotely, from hospitals to patient side or home-based care).
However, there are several challenges in implementing such AI models on computational tools such as Field Programmable Gate Arrays (FPGAs), Raspberry Pi and central processing unit (CPU)/graphics processing unit (GPU) based systems.These challenges arise due to the limited processing power, memory, and energy efficiency of these devices.It is essential to engage in a multidisciplinary approach that involves collaboration between domain experts, data scientists and hardware engineers to overcome these challenges.
Convolutional Neural Network (CNN) models have yielded unprecedented achievements in addressing computer vision challenges, including but not limited to image classification, object detection, and tracking.Nonetheless, their integration into embedded applications has been impeded by the substantial computational and memory requisites, thereby giving rise to a novel research domain known as model compression including bit reduction, knowledge distillation, tensor decomposition, network pruning, and microarchitecture [9].Interested readers are referred to [10] for detailed insights, advantages and limitations of each mentioned method.While these strategies have demonstrated notable achievements, they are not without their inherent constraints.
This paper introduces a lightweight Convolutional Neural Network (CNN) model designed specifically for real-time implementation as a classifier.In connected healthcare settings, where low latency and efficient processing are crucial, this lightweight CNN offers a promising solution.By optimizing the model's architecture and parameters, we aim to strike a balance between computational efficiency and classification accuracy, enabling real-time CAD detection.This approach has great potential to improve the deployment of AI systems in resource constrained environments, ultimately benefiting the overall healthcare systems.
The remaining paper is organised as; Sect. 2 summaries the available literature on real-time CAD classification networks, Sect. 3 highlights the proposed work and dataset description, Sect. 4 provides calculations and the experimental results and the conclusion and future work are presented in Sect.6.

Background
Coronary artery disease (CAD) primarily originates from the accumulation of atherosclerotic plaque within the epicardial arteries, leading to an imbalance in the supply and demand of oxygen to the myocardium, often resulting in ischemia [11].Chest pain is the predominant symptom, typically occurring during physical or emotional stress.Lifestyle modifications, pharmacological therapies, and invasive interventions are available strategies to modify this disease process, with the goal of stabilizing or regression of the disease [12].Despite the development of innovative imaging methods, such as MRI and/or coronary CT angiography, invasive coronary angiography remains the preferred diagnostic tool for assessing the severity of complex CAD, as endorsed by the 2019 guidelines of the European Society of Cardiology [13].The process of interpreting complex coronary vascular structures is a time-intensive task and presents challenges to the clinician [14].The implementation of realtime automatic CAD detection and labelling offers promise in overcoming these challenges by providing valuable support in the decision-making process.
Numerous methodologies for the automatic or semiautomatic assessment of coronary artery diseases have been proposed by various research groups [15].These methodologies adhere to a common framework comprising three fundamental steps: (1) extraction of the coronary artery tree, (2) computation of geometric parameters, and (3) analysis of stenotic segments.The pivotal phase significantly influencing the efficiency and precision of these algorithms is dependent on the extraction of the coronary artery tree.This task is accomplished through diverse techniques, including centerline extraction [16], graph-based methods [17], superpixel mapping [18], and machine/deep learning [19].Among these, machine and deep learning methods have exhibited substantial potential in CAD detection based on their commendable performance, adaptability to tuning, and optimization capabilities [20].The overarching objective pursued by developers and users of CNNs is to strike an optimal equilibrium between accuracy and speed, a concept often referred to as the "speed/accuracy trade-off" [21].This trade-off incorporates the endeavour to achieve high levels of CAD detection accuracy while simultaneously ensuring swift processing and analysis, a critical consideration in clinical practice.
Although several CNN-based approaches have reportedly achieved optimal accuracy in CAD detection, with Dice Similarity Coefficients surpassing 0.75 [22] and Sensitivity metrics exceeding 0.70 [23], there is notable neglect of their processing speed.The time required for image processing represents a critical performance indicator for the practical application of these methods.In literature, studies have reported processing times ranging from 1.1 to 11.87 seconds [17,22], 20 seconds [18] and, in some instances, even exceeding 60 seconds per frame [16].However, such durations are considered unacceptable for real-time CAD detection as the required processing time is 0.13 to 0.07 seconds per frame [24].Thus, our study presents a detailed analysis of light-weighted neural network architectures along with their potential in terms of accuracy and performance to classify healthy and CAD images.

Proposed work
In the proposed work, we implemented a light-weight neural network, that is, adapted version of LeNET-5 model [25] on the CAD Cardiac Magnetic Resonance Imaging dataset2 (proposed by Khozeimeh et al. [26]) for a comprehensive comparison.The results of the CNN-RF model [26] were considered as ground truth/reference.The input to the model is 2D CMR images.Figure 1 depicts examples of both categories' images.Pre-processing steps included resizing the images to 100x100 pixels and normalization between 0 and 1.The main contribution in the proposed work is 3-fold and is as follows:

eXplainable Artificial Intelligence (XAI): We integrated
GradCam [27] and Layer-Wise Relevance Propagation (LRP) [28], XAI techniques to provide interpretable insights into the model's decision-making process, generating heatmaps that highlight the regions of MRI/ CMR images that govern CAD classification.

Dataset description
The dataset consists of 63,151 multiparametric CMR Images including 37,290 healthy and 25,861 CAD patients images.CAD diagnosis was confirmed by invasive coronary Figure reproduced with permission from [26] angiography.Four MRI / CMR sequences (that is, Late Gadolinium Enhancement (LGE), Perfusion, T2 weighted, and Steady-State Free Precession (SSFP)) were used, capturing short and long axes plains of the heart.A total of 13 slices per patient were collected in four types of sequences.
During the pre-processing stage, a manual inspection was conducted on images from both subsets, and any images with poor MRI/CMR quality were excluded from further analysis.Following the pre-processing stage, the dataset consisted of 34,216 images from healthy patients and 17,438 images from patients with CAD.

Performance assessment matrices
The performance of the classifier is assessed using Positive Predictive Value (PPV), Recall (Sensitivity or True Positive Rate), Specificity (True Negative Rate), F1-Score, Area Under the Curve (AUC), Accuracy and Balanced Accuracy.Mathematically, each matrix is presented as: (1)

Results
Figure 2 illustrates the implemented model architecture 3 .All experiments were implemented in Python using the Karas library.The models were trained on Apple M2 Pro with 16 GB RAM.The following subsections discuss the time complexity calculations, the effect of hyper-parameter tuning, and feature explanations using XAI results.

Time complexity calculation and comparison
The time complexity of a model is determined by the number of layers and the operations performed in each layer.The proposed model architecture comprises seven layers, excluding the input layer, as shown in Fig. 2.These layers consist of C1 (convolutional), S2 (subsampling), C3 (convolutional), S4 (subsampling), FC5 (fully connected), FC6 (fully connected) and the output layer.The time complexity of each layer is as follows: Adding all the time complexities of each layer, the overall time complexity of the proposed model could be approximated to be: In our case, the input image shape is (100,100,1), filter size is varied between (C1 = 6, C3 = 6) and (C1 = 12, C3 = 6), kernel size = (5,5), pooling size = (2,2), strides = (2,2), units in fully connected layer 1 and layer 2 = 128, 84 respectively, while the output layer had only 1 unit, as the model is performing binary classification.
As the results are to be compared with CNN-RF models proposed by [26], the time complexity of their model is calculated to be: In both the time complexity equations, e is estimators, f is features, s is samples, ep is epochs, ts is train samples, tf is train features, tc is train channels, fs is filter size, and vs is validation samples.
A comparison of the time complexities between the proposed model and the CNN-RF model reveals that our model entails significantly lower computational overhead in comparison to the CNN-RF model.Our inference time on a Mac-Book laptop for 323 test images of size 100x100 is only 2.6 sec, which is only 8 milliseconds per image.Additionally, it provides better or equal classification accuracy.Our model's lower computational complexity enables faster image analysis and diagnosis, improving efficiency, and facilitating deployment on resource-constrained systems such as Raspberry Pi, FPGA or any other edge device for real-time classification and diagnosis in connected healthcare settings.

Hyper-parameter tuning and classification results
Various hyperparameter configurations were utilized to attain optimal model performance for CAD image classification.Table 1, 2, 3 and 4 present the diverse performance of the models obtained with different settings.The Parametric Rectified Linear Unit (PReLU) activation function combined with the Root Mean Squared propagation (RMSprop) optimizer resulted in the highest classification accuracy, achieving a general accuracy of 99.35% and a balanced precision of 99.13%.This surpasses the previously achieved highest accuracy of 99.18% obtained by the reference CNN-RF model.To test the generalizability of our model, a stratified cross-validation (CV) analysis was performed using 10-folds.The model showed similar performance as without CV, achieving classification accuracy of 99.22% (while the balance accuracy of 99.10%), as depicted in Table 5.
The sub-optimal performance of the proposed classifiers can be attributed to their reliance on the frame-based analysis.MRI sequences often produce a multitude of frames, some of which lack noticeable regions of interest (ROIs), as depicted in Figure 3 (all three view angles of MRI scan).The figure illustrates the frames with no ROIs (no visible coronary artery in the frame).The proposed model considers all the frames uniformly, irrespective of their diagnostic value.Thus, frames without ROIs introduce noise into the analysis, impairing the classifier ability to differentiate between images of patients with CAD (illness) and those of healthy individuals.This limitation underscores the need for more sophisticated methodologies that account for the inherent variability in MRI frames, enabling classifiers to consider frames based on the presence or absence of ROIs.

eXplainable AI
Explainable Artificial Intelligence (XAI) is a field in machine learning and artificial intelligence that focuses on developing models that can provide transparent and interpretable explanations for their decisions or predictions.In the context of connected healthcare settings, XAI not only helps ensure the quality and safety of care but also fosters trust among patients and healthcare providers.Several notable XAI techniques include: SHAP (SHapley Additive exPlanations) values provide a unified framework for explaining the output of any machine learning model by attributing contributions of each input feature to the model's prediction [29,30].

LIME (Local Interpretable
Model-Agnostic Explanations) generates local explanations by approximating complex model behaviour with simpler, interpretable models on a subset of data points [31].Saliency Maps highlight regions in input data (e.g., medical images) that are most influential in a model's prediction, aiding clinicians in understanding what the model is focusing on [32].Accumulated Local Effects (ALE) helps visualize how the relationship between a single feature and the model's prediction changes across different feature values [33].Contrastive Explanation Method (CEM) generates contrastive explanations, highlighting the minimal changes needed in input features to alter a model's prediction, which can be invaluable in understanding model behaviour [34].Global Interpretation via Recursive Partitioning (GIRP) uses recursive partitioning techniques to create a global interpretable model that approximates the original complex model [35].CAM (Class Activation Maps) highlights important regions in images that contribute to a specific class prediction, making it useful for image classification tasks [36].GradCAM (Gradient-weighted Class Activation Mapping) combines gradient information with CAM to provide more precise visualizations of feature importance in convolutional neural networks [37].LRP (Layer-wise Relevance Propagation) is a method that assigns relevance scores to each input feature, explaining how each feature contributes to the model's output [38].In this paper, we choose GradCAM and LRP due to their ability to provide precise, visual, and deep-level explanations, their compatibility with CNN-based models, and their established utility in the medical imaging domain.These methods collectively offer a comprehensive solution for improving the interpretability of AI models in a clinical context, ultimately leading to more informed and confident clinical decision-making.The results of each technique are explained as follows:

GradCAM heatmaps
Gradient-weighted Class Activation Mapping (Grad-CAM) is a computer vision technique used to generate a heatmap of the important regions in an image that significantly contributes to the prediction of the deep learning model [39].Figure 4 illustrates some examples of generated GradCAM heatmaps that highlight the focused regions (regions of interest) for the prediction of CAD in the test images.In the GradCAM visualization, the intensity of the heatmap represents the importance of each pixel in the input image.Higher intensity (e.g.brighter colours) and high-contrast colour with the background are indicative of a more significant region that contributed to the model's prediction.

Layer-wise relevance propagation (LRP)
Layer-wise Relevance Propagation (LRP) is an XAI technique used to understand the predictions made by deep learning models.The primary objective of LRP is to ascribe the model's predictions to specific regions or features within the input image [40].This helps explain why a particular classification decision was made, which is crucial in medical applications for trust and accountability.The core principle shared among various versions of the LRP algorithm is the conservation of the activation strength of an output node for a specific class, as it is propagated back through each layer of the neural network.This ensures that the total relevance associated with a particular class remains constant as it traverses the network layers during the explanation process [41].This study investigated two versions of the LRP algorithms i.e., LRP0 and LRP_epsilon.The LRP0 is a straightforward version that conserves relevance strictly but can lead to issues with non-differentiable activation functions.LRP_epsilon addresses these issues by introducing a small smoothing factor (epsilon) to improve the stability and interpretability of relevant heatmaps.Figure 5 displays the heatmaps produced by both algorithms along with the original images.The significance of features is visually represented using colours, with red indicating more critical features contributing to the classification of an image into a specific category.

Failure cases
The lack of contrast in the region of interest (ROI) or overly bright regions where there is no relevant information (ROI) presents a significant challenge.While the model appropriately emphasizes brighter regions, it struggles when the input image does not have enough contrast.Thus, the performance of the proposed model significantly depends on the quality of the input image.To address this issue, a potential solution is to implement a preprocessing step focused on enhancing image contrast.Furthermore, an iterative refinement process and parameter tuning may be employed to optimize the preprocessing step, ensuring adaptability to varying degrees of contrast in input images.However, it is  crucial to acknowledge that these approaches incur computational expenses due to the additional processing requirements.Therefore, a trade-off balance between computational resources and enhanced model performance needs to be met.

Conclusion
In  CNN and then feeds them to Random Forest (RF) classifier for classification.In addition, majority voting is performed to predict the final class (normal or sick patient image).On the other hand, the proposed model is a single seven-layered CNN model which outperforms the CNN-RF in terms of classification as well as time complexity making it more suitable to be implemented on edge devices and in connected healthcare settings.
The classification model performance (for both the models i.e., baseline and proposed) was measured using PPV, recall, specificity, F1-Score, AUC, and accuracy matrices.As the dataset has a class imbalance, an additional performance metric i.e., Balance Accuracy was also calculated during the analysis.The combination of different hyperparameters revealed different classification accuracies, as tabulated in Table 1 to 5.Among all the settings, the proposed model achieved the highest test accuracy of 99.35% (with balanced accuracy = 99.13%) with interlayer activation function to be PReLU, RMSprop optimizer, batch size of 32 and binary-cross-entropy loss-function.
The proposed model is also compared with a relatively more complex AlexNet in terms of classification accuracy, model complexity, and run-time complexity.With AlexNet achieving an accuracy of 98.89%, the proposed model demonstrates superior performance, as shown in Table 6.In addition to accuracy, the proposed model exhibits substantially reduced training and inference times (556.4seconds and 2.6 seconds, respectively) compared to AlexNet.Moreover, the architecture of the proposed model has significantly fewer trainable parameters (507,299) as well as a smaller model size (1.94MB), demonstrating its enhanced practicality and resource efficiency.
The achievement of such a high classification accuracy on the CAD test dataset with downsized images (100x100 pixels) using the proposed light-weighted model can be attributed to two main factors.Firstly, the representational efficiency of the model architecture is a key contributor.The proposed model demonstrates the capacity to learn the crucial features even in low-resolution images, enabling accurate predictions.Secondly, the downsampling of images does not severely compromise the model's proficiency in recognizing spatial hierarchies and patterns.
This research highlights the critical role of optimizing time complexity and hyperparameters in the development of sustainable healthcare AI models.By doing so, we can ensure the resource efficiency and real-time applicability of these models, while concurrently upholding their reliability.Furthermore, the incorporation of eXplainable AI (XAI) techniques provides essential interpretability, aligning AI-generated recommendations with the interpretations of clinical experts and safeguarding patient safety.
Future directions: The proposed investigative work aimed to provide insight into the optimization of healthcare AI models, ensuring accurate and reliable results while prioritizing patient safety, resource efficacy, and advancing the acceptance and understanding of AI in connected healthcare settings.While the results on the 2D CMR images are promising, in future 3D-CNN based models will be explored on other healthcare images such as Computer Tomography (CT), X-rays and/or Echocardiogram (Echos)

Fig. 1
Fig. 1 Example of 2D MRI/ CMR images from CAD patients (a-c) and healthy subjects (d-f).The yellow circle highlights the region indicative of CAD in sub-images (a-c).Figurereproducedwith permission from[26] Accuracy = TP + TN TP + TN + FP + FN Where TP: True Positives, TN: True Negatives, FP: False Positives, FN: False Negatives, ROC: Receiver Operating Characteristic and FPR: False Positive Rate.
O n e log(n e )n f n s log(n s ) + 2 O(n s ) + O(n cnn n ep n ts n tf n tc fsfs)+ O(n ts n cnn ) + O(n ts n cnn log(n cnn )) + O(n vs n cnn ) .

Fig. 3
Fig. 3 Original images from the sick dataset.a is Axial-view b is Sagittal-view while c shows a Coronal view of a chest MRI scan (one frame) conclusion, this research study aimed to propose a lightweighted Convolutional Neural Network (CNN) model tailored for real-time CAD image classification tasks in connected healthcare environments.The study placed a strong emphasis on optimizing hyperparameter configurations to enhance the efficiency and accuracy of AI models in healthcare-related classifications.Moreover, to provide the interpretability of the model's predictions, we incorporated the GradCam and LRP algorithms, that highlighted the significant features within input images that influence classification decisions.The achieved results are compared with the state-of-theart algorithm present in the literature (an ensemble of 10 CNN-RF networks).The CNN-RF model is more computationally expensive as it extracts classification features using

Fig. 4
Fig. 4 Heatmaps generated by GradCAM on test images.The most important features of the images that contribute to the classification of the image into certain classes are shown in darker colours.The three images are original, heatmap, and superimposed image

Fig. 5
Fig. 5 Heatmaps generated by LRP on test images.The left column has original images, the middle column is the output heatmaps of LRP0 while the right column is the output heatmaps of LRP_Epsilon Technique

Table 1
Model parameters settings: model with filter size C 1 and C 3 = 6,6;

Table 3
Model parameters settings: model with filter size C 1 and C 3 = 6,6;

Table 5
Model's best performances achieved with different settings: comparison Note: * are model's results with 10-fold Stratified Cross-Validation

Table 6
Comparison of the proposed model with AlexNet in terms of classification accuracy, model complexity and run-time complexity on 323 test images images to determine the model's comprehensive diagnostic capabilities, cross-domain scalability, and performance on Multi-model data.Moreover, we propose the integration of two techniques to further improve the designed classification models' performances: majority voting for frame-based analysis and the implementation of a video-based classifier.Combining these techniques offers a promising path towards a more accurate and reliable classification model to distinguish between patient and healthy images in MRI scans.