Introduction

Brain metastases are the most common central nervous system malignancy and affect up to 30–40% of cancer patients [1]. Stereotactic radiosurgery (SRS) is an accepted standard of care for the treatment of limited brain metastases (Brown et al. 2016). Two critical steps in planning for SRS are the identification and localization of individual brain metastases on the patient scans and the delineation of the tumor boundaries by the radiation oncologist and/or neurosurgeon. The latter process can be time-consuming and subject to a high degree of inter-observer variability, especially for small brain metastases [2,3,4].

Artificial intelligence (AI) has demonstrated promise in addressing these issues. With the goal of improving efficiency and standardization, machine learning models have recently been developed for automated detection and segmentation of metastatic brain tumors [2, 5,6,7,8,9,10,11,12]. However, the published literature thus far is comprised of technical proof-of-concepts in which the model is tested on small, limited sample sizes, and/or it is not readily deployable to the clinic.

VBrain is a deep learning (DL) algorithm patented by Vysioneer Inc. that received medical device clearance by the Food and Drug Administration (FDA) in 2021 and has been shown to significantly improve inter-reader agreement, contouring accuracy, and efficiency [13, 14]. We aim here to validate this tool in a heterogenous cohort of patients who have been treated with SRS for brain metastases at a single institution as well as provide guidance for the scope of its use.

Methods

Retrospective patient cohort

We obtained approval from Stanford University institutional research ethics board to conduct this study. Our institution has extensive experience with SRS of brain metastases, as previously described [15]. We included 100 randomly selected patients with unresected brain metastases treated with SRS at our institution from 2017 to 2020. Patients who had prior intracranial resection or intracranial radiation were excluded.

Deep learning-based algorithm

VBrain is a commercial, FDA-cleared DL-based algorithm that uses magnetic resonance imaging (MRI) and computed tomography (CT) to segment brain metastases. VBrain adopts the ensemble strategy to optimize the segmentation results: 3D U-Net addresses overall tumor segmentation with high specificity while the DeepMedic model focuses on smaller lesions with a high sensitivity [14,15,16]. The network was trained with a novel volume-aware Dice loss function, which uses information about lesion size to enhance the sensitivity of small lesions [17].

Workflow for automatic detection and segmentation

For each patient, three sets of Digital Imaging and Communications in Medicine (DICOM) files used during SRS planning were exported from our institutional CyberKnife and/or Picture Archiving and Communication System: (1) the CT scan, (2) the axial T1-weighted post-contrast fast spoiled gradient echo MR scan, and (3) the Radiotherapy Structure Set (RTSS). The files were stripped of the protected health information contained in the DICOM headers using a custom script and relabeled using a unique study ID. The anonymized CT and MR scans for each patient were processed by the VBrain software to generate an RTSS with automatically identified and contoured brain metastases.

Evaluation

Subsequent analyses compared the two RTSSs: output contours from VBrain against the physician-defined contours used for SRS. A brain metastasis was considered “detected” when the VBrain- “predicted” contours overlapped with the corresponding physician contours (“ground-truth” contours). We evaluated performance of the predicted contours against ground-truth contours using the following metrics: lesion-wise Dice similarity coefficient (DSC), lesion-wise average Hausdorff distance (AVD), false positive (FP) count, and lesion-wise sensitivity (%).

The lesion-wise DSC was evaluated for only detected lesions, defined as ground-truth lesions that contained within them the centroid of a predicted lesion. FP was defined as the predicted regions which do not overlap with any ground-truth lesion. Lesion-wise sensitivity was defined as the ratio of the total number of detected lesions by VBrain to the total number of ground-truth lesions. Due to the small tumor sizes of the cohort, we also reported the lesion-wise sensitivities with effective diameters equal to and greater than 10 mm, 7.5 mm, and 5 mm, where the effective diameter was defined as the diameter of a volume-equivalent sphere.

The patient cohort was stratified by demographics (age, sex, race) and clinical characteristics (histology type, lesion count, and lesion size) to identify whether significant differences in performance existed in certain groups. Kruskal–Wallis tests were performed to assess the relationships between patient characteristics (including sex, race, histology type, age, and size and number of brain metastases) and performance metrics (including mean lesion-wise DSC, mean lesion-wise AVD, mean FP count, and lesion-wise sensitivity). All tests used a significant p-value threshold of 0.05 unless stated otherwise. All statistical analyses were conducted using the SciPy v1.5.2 package in Python 3.8.7.

Results

Patient demographics

We analyzed 100 patients with 435 intact brain metastases treated with SRS at our institution. Demographic characteristics for our patient cohort are summarized in Table 1. The median number of brain metastases per patient was 2 (range: 1 to 52), and the median tumor size was 0.112 c.c. The most common primary histologies were lung (56%), melanoma (10%), and breast (9%).

Table 1 Demographic and clinical cohort characteristics consisting of 435 brain metastases distributed across 100 patients

Overall performance and stratified assessment

Comparison metrics evaluating performance of VBrain against clinical ground-truth contours for all brain metastases are described in Table 2. We found mean lesion-wise DSC to be 0.723, mean lesion-wise AVD to be 7.34% of lesion size (0.704 mm), mean FP count to be 0.72 tumors per case, and lesion-wise sensitivity to be 89.30% and 96.23% for all lesions and lesions with diameter greater than 5 mm. Furthermore, sensitivity was found to be 85.37% and 90.23% for patient cases with one or two metastases and with three or more metastases, respectively.

Table 2 Performance metrics

As shown in Table 3, sensitivity was found to be 99.07%, 94.83%, and 93.94% for lesions with diameter equal to and greater than 10 mm, between 7.5 mm and 10 mm, and between 5 and 7.5 mm, respectively. The size of the brain metastases was significantly associated with lesion-wise DSC (p < 0.001) and sensitivity (p < 0.001), and the number of brain metastases per patient significantly correlated with sensitivity (p < 0.05; Table 4).

Table 3 Lesion-wise sensitivity by effective diameter of brain metastases
Table 4 Kruskal–Wallis tests were performed to assess the relationships between patient and lesion characteristics, and performance metrics

Figure 1 and Fig. 2 illustrate cases in which VBrain effectively predicted brain metastases among patients with numerous lesions (52) and lesions of small size (2.5 and 4.2 mm diameters). Figure 3 demonstrates challenging cases with tiny lesions, poor image quality, or insufficient contrast in the MR scan for which diagnostic reports and/or longitudinal images might be required for additional reference. No other statistically significant differences in performance metrics were observed across demographic and clinical characteristic groups.

Fig. 1
figure 1

Case with 52 Lesions. a Axial view. b 3D view. VBrain successfully predicted multiple brain metastases for this patient case with over 50 brain metastases, as this case had a Dice similarity coefficient (DSC) of 0.813, average Hausdorff distance (AVD) of 3.81% (0.511 mm), false positive count (FP) of 0, and sensitivity (%) of 90% and 100% for overall and >  = 5 mm tumors, respectively

Fig. 2
figure 2

Case with Tiny Lesions. As highlighted by the bounding box, VBrain successfully contoured brain metastases with a diameter of 2.5 mm (a) and 4.2 mm (b). This case had a Dice similarity coefficient (DSC) of 0.944, average Hausdorff distance (AVD) of 1.78% (0.828 mm) false positive count (FP) of 1, and sensitivity (%) of 100% for both overall and >  = 5 mm tumors

Fig. 3
figure 3

Challenging Cases: a Tiny Lesion. (0.02 c.c.) b Image Artifacts. c Insufficient Contrast. In these cases, the diagnostic report and longitudinal images may be required for additional reference

Discussion

Our analysis included 435 brain metastases in 100 randomly selected patients who were treated with SRS at our institution. This analysis contains significantly more brain metastases and smaller brain metastases than other published series evaluating brain metastases segmentation algorithms [6]. The median tumor size in our study was 0.112 c.c., which is 5–10 times smaller than other cohorts [9, 18]. Smaller lesions are more challenging to detect as well as segment [19]. However, increasingly smaller lesions are being treated with radiation now with improvements in imaging and treatment capabilities. Thus, it is critical to evaluate the performance of available auto-segmentation software for these lesions. Further, many of the previous papers used their cohorts to perform both training and validation. Our study used the entire cohort to perform external validation of VBrain. The primary cancer site distribution of our study cohort is representative of the general population with brain metastases, which includes mostly lung (40–50%), breast (15–25%), and skin (5–20%) primaries [20].

DSC and sensitivity were all found to be significantly associated with the size of brain metastases. 99.07% lesion-wise sensitivity was achieved for tumors greater than 10 mm but decreased to 97.59% and 96.23% for lesions greater than 7.5 mm and 5 mm. Furthermore, sensitivity was found to be significantly associated with tumor counts. There were no other significant associations between patient characteristics and VBrain performance metrics.

There are some limitations to this study. First, these patients were treated at a single academic institution with extensive radiosurgical experience and dealing with, on average, more and smaller brain metastases, which may limit generalizability. Smaller intracranial lesions are difficult to be identified and contoured, which is a common challenge with any segmentation method, manual or automated [19]. Thus, VBrain’s performance in this study may underestimate its overall performance on a general patient population. Second, we excluded patients with prior intracranial radiation or surgical resection. Although these patients represent a minority of radiosurgical cases, further work will be needed to evaluate VBrain’s ability to differentiate between resection cavities, pre-treated lesions, and untreated lesions. Finally, it is important to note that thin-slice 3 T MRI brain with contrast scans should generally be used for SRS contouring [21] but were available for 98% of the patient cases in this study.

VBrain is a clinic-ready and FDA-cleared AI software intended to assist trained medical professionals by providing initial brain metastases contours. In a prior reader study evaluating five brain metastases cases, VBrain assistance significantly improved inter-reader agreement, contouring accuracy, and efficiency, and clinicians were able to detect 12% more lesions than they would have without the software [14]. Although VBrain has been shown to identify brain metastases missed by physicians and reduce contouring time, based on its intended use cleared by the FDA, this tool cannot replace the expertise of the treating physician who must review and modify the final treatment contours.

Future avenues of exploration for VBrain and other tumor auto-segmentation tools are their powerful potential for research application. For example, these tools can enable instantaneous tracking of brain metastases over serial MRIs to evaluate response to novel treatments as well as inform real-time clinical decision making. As advances in imaging and treatment-delivery capabilities enable the detection and treatment of increasingly complex cases of brain metastases, future work is ongoing to develop and improve AI tools to assist in SRS treatment planning.