Deep learning assistance increases the detection sensitivity of radiologists for secondary intracranial aneurysms in subarachnoid hemorrhage

Purpose To evaluate whether a deep learning model (DLM) could increase the detection sensitivity of radiologists for intracranial aneurysms on CT angiography (CTA) in aneurysmal subarachnoid hemorrhage (aSAH). Methods Three different DLMs were trained on CTA datasets of 68 aSAH patients with 79 aneurysms with their outputs being combined applying ensemble learning (DLM-Ens). The DLM-Ens was evaluated on an independent test set of 104 aSAH patients with 126 aneuryms (mean volume 129.2 ± 185.4 mm3, 13.0% at the posterior circulation), which were determined by two radiologists and one neurosurgeon in consensus using CTA and digital subtraction angiography scans. CTA scans of the test set were then presented to three blinded radiologists (reader 1: 13, reader 2: 4, and reader 3: 3 years of experience in diagnostic neuroradiology), who assessed them individually for aneurysms. Detection sensitivities for aneurysms of the readers with and without the assistance of the DLM were compared. Results In the test set, the detection sensitivity of the DLM-Ens (85.7%) was comparable to the radiologists (reader 1: 91.2%, reader 2: 86.5%, and reader 3: 86.5%; Fleiss κ of 0.502). DLM-assistance significantly increased the detection sensitivity (reader 1: 97.6%, reader 2: 97.6%,and reader 3: 96.0%; overall P=.024; Fleiss κ of 0.878), especially for secondary aneurysms (88.2% of the additional aneurysms provided by the DLM). Conclusion Deep learning significantly improved the detection sensitivity of radiologists for aneurysms in aSAH, especially for secondary aneurysms. It therefore represents a valuable adjunct for physicians to establish an accurate diagnosis in order to optimize patient treatment.


Introduction
Aneurysmal subarachnoid hemorrhage (aSAH) is caused by spontaneous rupture of an intracranial aneurysm and accounts for approximately 85% of non-traumatic SAH [1,2]. With mortality ranging between 23% and 67%, aSAH poses a potentially life-threatening condition and a worldwide health burden leaving approximately 20% of long-term survivors permanently disabled [1,3].
Accurate and reliable detection of a potentially ruptured intracranial aneurysm (RIA) is essential for diagnosis and the subsequent treatment concept in patients with non-traumatic SAH [4,5]. In particular, timely aneurysm embolization by surgical or endovascular means is mandatory in order to prevent rebleeding, which may decisively affect patients' prognosis. Generally, CT angiography (CTA) represents the imaging modality of choice to screen for aneurysms and is performed immediately upon radiological proof of SAH. The detection sensitivity of CTA for intracranial aneurysms is reported to range between 85% and 98% when compared to digital subtraction angiography (DSA) [6][7][8].
Due to an increasing workload of radiology departments, physician fatigue and the "satisfaction of search" phenomenon represent a real concern, aggravating the risk of missing relevant findings [9,10]. Given that aneurysm detection on CTA proves to be challenging with misdiagnosis of aSAH eventually resulting in a poor clinical outcome, automated detection of intracranial aneurysms may be of valuable assistance to physicians [11][12][13][14]. Over the last decade, deep learning models (DLMs), in particular convolutional neural networks (CNNs), have shown great potential in performing diagnostic and analyzing tasks on medical imaging for different subspecialties [15][16][17][18]19].
Previous studies have introduced several approaches for deep learning-based detection of aneurysms on CTA [18,[20][21][22] or magnetic resonance angiography (MRA) [23][24][25], compared the accuracy of the DLM to human readers, and investigated the deep learning-augmented diagnostic performance of physicians [20,25]. However, these studies did not include patients with aSAH and focused on unruptured intracranial aneurysms (UIAs) [20,25]. Therefore, if deep learning would enhance the diagnostic sensitivity of human readers of different experience levels for aneurysms in aSAH remains a relevant question. In a recent study, a DLM was introduced, which provided a high detection sensitivity of aneurysms in aSAH independent of cerebral circulation and bleeding severity [26].
The objective of the study was to investigate whether this DLM could increase the detection sensitivity of radiologists for intracranial aneurysms on CTA in patients with aSAH.

Materials and methods
The institutional review board approved this retrospective, single-center study (reference number: 19-1329) and waived the necessity for written informed patient consent.

Patient population and data collection
Two hundred ten consecutive patients with aSAH treated between January 2013 and December 2017 at our tertiary care university hospital were reviewed by the authors. Patients were excluded given the following criteria: (I) unavailable CTA scans (n = 16), (II) no aneurysm finding on CTA (n = 6), (III) motion artifacts on CTA (n = 2), (IV) insufficient contrast of CTA (n = 7), (V) preprocessing failure (n = 5), and (VI) previously treated aneurysms (n = 2).
Included CTA scans of patients treated between 2016 and 2017 served as the training set for the DLM (68 patients/79 aneurysms), whereas patients treated between 2013 and 2015 formed the test set (104 patients/126 aneurysms). These datasets were previously used for the training of the DLM and formed a part of the test set in the aforementioned study [26]. In this study [26], patients between 2010 and 2015 were used as a test set (in total: 185 patients with 215 aneurysms).
The included 170 scans of the present study were acquired at a single center on different multidetector CTs (iCT (n = 161), brilliance 64 (n = 3), and brilliance 16 (n = 6); Philips Healthcare, Best, the Netherlands) using a standardized clinical protocol for head & neck or head imaging with slice thickness ranging between 0.62 to 1.25 mm. CTA scans of two patients were acquired at referring institutions. After anonymization, CTA source images were exported to IntelliSpace Discovery for segmentation (ISD, v3.0.6, Philips Healthcare).

Reference standard
In order to establish the aneurysm count and reference standard, a radiologist with 3 (L.P.), a neurosurgeon with 4 (L.G.), and a board-certified neuroradiologist with 12 years of experience (C.K.) in neurovascular imaging performed a double review of CTA and DSA (available in 153 patients, 89.0%) scans as well as their reports. Discrepancies were resolved in consensus.
The aneurysm findings were manually segmented by the aforementioned neurosurgeon and radiologist on ISD. Segmentation was performed interactively using a two-step, semiautomatic approach. First, a rough segmentation of the aneurysm was obtained by 3D voxel-wise regional thresholding. Afterwards, the segmentation was further edited manually using 2D editing tools. Additionally, the abovementioned readers collectively reviewed non-enhanced CT scans to determine respective Fisher grade of aSAH in consensus.

Image preprocessing
In order to enable the DLM-based automatic aneurysm segmentation workflow, a preprocessing pipeline was employed [26]: First, a brain extraction algorithm was developed using Statistical Parametric Mapping software package version 8 (SPM8; Wellcome Trust Centre for Neuroimaging) to compute the brain mask [27]. Second, image standardization was performed by resampling to an isotropic resolution of 0.5 mm 3 and intensity normalized. Third, a multi-scale vessel enhancement filter (consisting of two vessel-enhanced images, one with scale 0.5-5 voxels and the other with scale 5-15 voxels) was applied to the brain masked images enabling enhancement of the arteries from the background of the scans to distinguish between blood vessels and aneurysms [28]. Fourth, the original CTA image was normalized between 5 and 95% percentiles, while the vessel enhanced images were Z-score normalized.

Deep learning model
The specifics of the used DLM and its training are described elsewhere [26]. In brief, 3D CNNs based on DeepMedic (Biomedical Image Analysis Group, Department of Computing, Imperial College London) [29] were used in this study.
Three separate DLMs (DLM-Orig, DLM-Vess, and DLM-LDim), which receive different inputs of the CTA datasets, were trained on the training set applying fivefold cross validation using an 80-20% training-validation split. By combining the outputs of the three DLMs, an ensemble model similar to the work of Kamnitsas et al. was created [30]. We refer to this combination strategy as DLM-Ens. To this end, the three trained DLMs were applied to the test set with every trained DLM consisting of five individual submodels that resulted from the fivefold cross-validation training approach. Using simultaneous truth and performance level estimation (STAPLE), outputs from these five submodels were fused together. Subsequently, the STAPLE outputs from the three DLMs were passed to the DLM-Ens to produce the final, fully automated three-dimensional aneurysm segmentations [31].
The fully automated workflow of image preprocessing and aneurysm detection is depicted in Fig. 1. The time needed for the DLM to fully automatically segment the aneurysms is about 3 min (including image preprocessing and model ensembling). Figure 2 displays the fully automated aneurysm segmentation on the IntelliSpace Discovery user interface.

Detection of aneurysms by the radiologists
Anonymized and unlabeled CTA scans of the test set were presented to three radiologists with different levels of expertise (reader 1: 13 years (J.B.), reader 2: 4 years (U.H.), and reader 3: 3 years (A.K.) of experience in diagnostic neuroradiology) in random order for individual assessment of intracranial aneurysms. These readers were not involved in the definition of the reference standard. In order to simulate an emergency setting, which is the case during admission of a patient with aSAH, readers were instructed to perform their readings in a thorough but timely fashion. All readings were performed on the same IMPAX EE (Agfa HealthCare N.V., Mortsel, Belgium) workstation as used in clinical routine, applying the Multiplanar-Reconstruction-(MPR) tool if needed.
Readers were instructed that every dataset comprised at least one aneurysm and advised to report their findings in a Microsoft Excel (2016, Microsoft Corp., Albuquerque, New Mexico, USA) datasheet using the following labels: ICA (internal carotid artery), ACA (anterior cerebral artery including anterior communicating artery), MCA (middle cerebral artery), and POSTERIOR (posterior cerebral artery, posterior communicating artery, and vertebrobasilar territory) with respective vessel side attached. Beyond that, readers were blinded to patient and clinical data. Additionally, readers were instructed to record the reading time for each scan in seconds. After the initial reading, the readers were provided with the results from the DLM and could amend their readings accordingly in a separate column of the datasheet. The detection performance of the DLM-Ens and the radiologists were compared to the reference standard.
In order to investigate if the DLM may enhance the detection rate of the radiologists, results of readers 1, 2, and 3 were combined with the detections provided by the DLM.

Statistical analysis
Statistical analysis was performed using Microsoft Excel and SPSS (V22.0, IBM Corp., Armonk, NY, USA), with P < 0.05 considered statistically significant. Categorical variables are presented as numbers and percentages. Continuous variables are reported as mean ± standard deviation (SD) and range. The sensitivity of the radiologists and of the DLM-Ens was calculated by comparison to the reference standard as provided by segmentations of aneurysms on ISD. Detection rates were compared using the Fishers' exact test or the paired Student's t-test, when appropriate. Interrater agreement was assessed using Fleiss κ with 0 indicating no, 0.01-0.2 slight, 0.21-0.40 fair, 0.41-0.6 moderate, 0.61-0.80 substantial, and 0.81-1.00 almost perfect agreement [32].

Patient and aneurysm characteristics
In the test set, 126 aneurysms of 104 patients (mean age: 55.4 ± 14.3 years, 66 females) were segmented on ISD and comprised the reference standard. Baseline patient and aneurysm characteristics are outlined in Table 1 Interrater agreement among the radiologists was moderate (Fleiss κ of 0.502; 95% confidence interval (CI): 0.416-0.588). Table 2 provides detailed results regarding the missed aneurysms by the readers and the DLM. In particular, there were three aneurysms that were missed both by the DLM and the radiologists (Fig. 5): A mycotic aneurysm of the left anterior cerebral artery was probably missed due to its atypical, peripheral location. Two further aneurysms at the anterior communicating artery and the right posterior communicating artery were missed, probably due to their small size and because they were overlooked or misinterpreted as infundibula by the readers. For all readers combined, there were a total of 10 primary aneurysms, which were missed bythe readers (average volume: 82.0 ± 60.8 mm 3 , Table 2). When excluding the above-mentioned relatively large mycotic aneurysm of the left ACA (n=3), these findings showed a volume of 40.9 ± 21.7 mm 3 and were located at the anterior (n=3) and posterior (n=4) communicating arteries.

DLM-augmented detection sensitivity of the radiologists
After disclosure of the DLM results, reader 1 found 8 additional aneurysms (thereof 8 secondary aneurysms), reader 2 found 14 (13 secondary aneurysms), and reader 3 found 12 (9 secondary aneurysms) that they missed before. For all readers combined, 88.2% of additionalaneurysms were secondary aneurysms. Consequently, detection sensitivity improved to 97.6%, 97.6%, and 96.0% for readers 1, 2, and 3 with a mean overall sensitivity of 97.1% (P = 0.024). Furthermore, interrater agreement increased to almost perfect with a Fleiss κ of 0.878 (95% CI: 0.788-0.969). Table 3 provides detailed results regarding aneurysm detection by radiologists alone and in combination with the DLM as well as their reading time.
The DLM significantly enhanced the detection of the radiologists for aneurysms of both small (< 100 mm 3 ) and large (> 100 mm 3 ) volume as outlined in Table 4.

Discussion
In the present study, we aimed to investigate whether a DLM could increase the detection sensitivity of radiologists for intracranial aneurysms on CTA in patients with aSAH. Deep learning assistance significantly improved the detection sensitivity of the readers to more than 95% independent of the experience level, predominantly by increasing the detection rate for secondary aneurysms. The DLM could assist readers in the detection of both small and large aneurysms while leading to an increase of interrater agreement from moderate to almost perfect.
Previous studies have evaluated DLMs for the detection of intracranial aneurysms on CTA [18,[20][21][22] and time-of-flight (TOF)-MRA [23][24][25] and investigated whether deep learning enhancement could increase the diagnostic performance of human readers [20,25]. In the study by Park et al., artificial intelligence assistance increased the detection sensitivity of radiologists and of a neurosurgeon (2-12 years of experience) for UIAs on CTA significantly from 83% to 89% [20]. In a TOF-MRA study by Faron et al., deep learning assistance boosted the detection sensitivity of radiologists (2 and 12 years of experience) for UIAs, albeit without reaching statistical significance [25].
The present study is the first that investigated the performance of deep learning-assisted detection of aneurysms in patients with SAH by radiologists. The detection sensitivity of the DLM was 86%, which was comparable to that of readers in the present study (87%-92%) and to that of physicians for UIAs on CTA (83% for aneurysms > 3 mm [20]) and slightly lower to that of DLMs for UIAs on TOF-MRA (90% [24]). Given the accumulation of blood in the subarachnoid space in patients with aSAH, one might expect that the performance of the DLM may be impaired by the presence of hyperdense material adjacent to the hyperdense arteries. However, the DLM of the present study finds a considerably low number of false positives per scan (less than 1), which is unaffected by the degree of the hemorrhage [26]. This number of false-positive findings per scan is in fact lower than in other studies investigating deep learning-based detection of predominantly unruptured aneurysms on TOF-MRA (e.g., Sichtermann et al.  Interestingly, the DLM and the readers missed a comparably large mycotic aneurysm at a peripheral branch of the anterior cerebral artery. Regarding the DLM, it might be speculated that a more elaborate training including atypical aneurysm locations would further increase the detection sensitivity. Based on these considerations, a large-volume, multi-center registry training study should be performed prior to implementation of the DLM into clinical practice. Alongside the aforementioned studies, the DLM boosted the detection sensitivity of all three radiologists with different experience levels of neurovascular imaging for small and large aneurysms, even for reader 1 with 13 years of experience [20,25]. In the study by Park et al. (which also included CTA scans without aneurysms), artificial intelligence significantly increased the sensitivity of human readers from 83% to 89% [20], whereas, in the present study, the DLM allowed to increase the sensitivity to a larger extend (from 88% to 97%). In the TOF-MRA study by Faron et al., deep learning assistance resulted in a similar detection sensitivity (98% and 97%, respectively). However, the detection sensitivity of human readers alone was already 94% and 95% [25]; hence, the DLM did not increase human sensitivity to the same extent as in the present study.
Given the complexity of intracranial vessels, CTA-based detection of aneurysms proves to be time-consuming and challenging, therefore showing a large variability among physicians, even for experienced readers and especially for small aneurysms [11,12,14]. Consequently, only a moderate interrater agreement for the three readers was noted with the highly experienced reader achieving a higher detection rate than the less experienced ones, as could be expected. After combining their detections with deep learning-generated findings, an almost perfect interrater agreement was noted with the DLM providing additional findings, including aneurysms < 100 mm 3 (which translates to a maximum diameter of 6 mm). Therefore, augmenting physicians' performance could potentially lead to a more precise and consistent interpretation of imaging data. Of note, the DLM applied in this study had a higher impact regarding the increase of interrater agreement compared to the study by Park et al. (0.376 vs. 0.060) [20].
Given the increasing workload of radiology departments, DLMs and computer-aided detection algorithms represent a valuable assistance to clinicians to deal with the growing amount of data that need to be analyzed [16,33,34]. With an overall detection rate of 86%, the DLM itself was inferior to a trained and experienced neuroradiologist, who had a detection rate of 91%. However, the neuroradiologist missed 11 additional aneurysms, which can be most likely attributed to the "satisfaction of search" phenomenon. The DLM detected  8 of these 11 aneurysms. Hence, deep learning-generated detections provide an important adjunct in an emergency setting, especially for unexperienced and potentially overstrained readers, for which the DLM detected 14 and 12 additional aneurysms, respectively. Of note, a total of 10 primary aneurysms were missed by all readers combined, the majority (n=9) by the unexperienced readers. Despite the knowledge that every dataset harbored at least one aneurysm, these findings were most likely missed due to their atypical, peripheral localization and their small size at the anterior and posterior communicating artery, consequently leading to an oversight because they were misinterpreted as infundibula.
The "satisfaction of search" phenomenon poses a relevant concern in radiology and medicine in general [9,10]. In the current study, the radiologists were instructed that every dataset contained at least one aneurysm; consequently, readers were able to detect at least one aneurysm in most cases. It is likely that these prerequisites have artificially increased the overall detection sensitivity of the radiologists. However, in clinical practice, the radiologist would also expect to detect an aneurysm after the diagnosis of basal SAH, even in the comparatively rare case of non-aneurysmal SAH. Despite this advantage for the radiologists, the DLM achieved comparable results to the readers and significantly increased their detection sensitivity. The effects might therefore be even stronger in clinical routine. Furthermore, deep learning assistance especially helped the radiologists to detect secondary aneurysms, which are present in about 20% of patients in aSAH (17% in the present study) [35,36]. Eighty-eight percent of the additional aneurysms detected by the DLM were in fact secondary aneurysms. These results demonstrate that the DLM provides relevant support to human readers, in particular to physicians that are impaired by the satisfaction of search and lack of training or lack concentration due to fatigue.

Limitations
The following limitations need to be considered: First, the physiological distress of an emergency setting like an aSAH situation cannot be fully reproduced in a retrospective study.   Second, the retrospective collection of the included data will have led to a relative spectrum bias. Third, the readers were aware that the patients in the test set harbored at least one aneurysm. Therefore, the only unknown variable was the number of patients who had secondary aneurysms. This prerequisite of the study design most likely led to an artificially increased sensitivity of the radiologists. Fourth, given that CTA scans of patients with non-aneurysmal SAH were not included, there is no control group of patients without aneurysms, which does not allow for the evaluation of a specificity. Fifth, the included CTA datasets in this study were almost exclusively acquired at a single institution with a fixed scanning protocol and of sufficient image quality, which does not reflect daily clinical practice and limits the generalizability of the algorithm. Hence, the findings and the true clinical impact of this study need to be confirmed in a prospective, multi-center setting, which allows for external validation and in which deep learning-generated detections are implemented directly in the clinical workflow.

Conclusions
In patients with aSAH, the DLM significantly increased the detection rate of radiologists for intracranial aneurysms to more than 95%. Additional findings were predominantly secondary aneurysms. These results indicate that the integration of deep learning assistance could provide a valuable adjunct to radiologists for accurate aneurysm detection among aSAH patients.
Code availability n/a.
Funding Open Access funding enabled and organized by Projekt DEAL.
No funding was received for this study.
Data availability The datasets generated/analyzed during this study are not publicly available due to data protection but are available from the corresponding author upon reasonable request.

Declarations
Conflict of interest The following authors of this manuscript declare relationships with the following company: Philips Healthcare: Lenhard Pennig: research support; Rahil Shahzad, Frank Thiele, and Michael Perkuhn: employees; Jan Borggrefe: speaker's bureau.
Ethics approval The institutional review board approved this retrospective, single-center study (reference number: 19-1329) and waived the necessity for written informed patient consent.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.