Investigation of distributed learning for automated lesion detection in head MR images

In this study, we investigated the application of distributed learning, including federated learning and cyclical weight transfer, in the development of computer-aided detection (CADe) software for (1) cerebral aneurysm detection in magnetic resonance (MR) angiography images and (2) brain metastasis detection in brain contrast-enhanced MR images. We used datasets collected from various institutions, scanner vendors, and magnetic field strengths for each target CADe software. We compared the performance of multiple strategies, including a centralized strategy, in which software development is conducted at a development institution after collecting de-identified data from multiple institutions. Our results showed that the performance of CADe software trained through distributed learning was equal to or better than that trained through the centralized strategy. However, the distributed learning strategies that achieved the highest performance depend on the target CADe software. Hence, distributed learning can become one of the strategies for CADe software development using data collected from multiple institutions.


Introduction
Numerous research groups have developed computer-aided detection (CADe) software, and recently, there has been an increase in CADe software using machine learning, including deep learning [1][2][3][4][5][6].The performance of CADe software using machine learning depends on the quality and quantity of the data used for machine learning models.If the characteristics of data differ between development and clinical use, the expected performance of CADe software may not be achieved.The factors contributing to data heterogeneity include differences in scanners or scan parameters used and differences in subject populations [6].Recently, several guides have suggested using data gathered from multiple institutions to develop medical image processing based on artificial intelligence [7,8].Several research groups, including our group, clarified that training using data collected from multiple institutions can improve the generalization of CADe software to external datasets [9][10][11].
Current CADe software development strategies typically employ a centralized strategy (hereafter, Centralized), in which the development is conducted at a development institution after the collection of de-identified data from multiple institutions [12].In recent years, distributed learning, which involves sharing a machine learning model among institutions while keeping data locally, has attracted attention.This includes federated learning (FL) [13] (Fig. 1) and cyclical weight transfer (CWT) [14] (Fig. 2).In FL, a single machine learning model is trained simultaneously at all institutions and combines the model updates on a server to generate a global model that reflects the data of each institution.FL was designed to address situations such as (1) uneven numbers of cases among different facilities and (2) a large number of institutions, but with no guarantee that all will participate.However, FL demands a trustworthy server.FL is used in medical image analysis; for instance, in the classification of skin diseases in color images [15], classification of breast cancer in mammography images [16], and segmentation of head and neck cancer in 18 F-fluorodeoxyglucose positron emission tomography/computed tomography images [17].In contrast, in CWT, after training a model for a predetermined number of epochs at one institution, the model is sent to the next institution, for further training.This process is cyclically repeated among all institutions.CWT does not require a server and is suitable when the participating institutions are fixed.However, when there are variations in image quality between institutions, there tends to be instability in training, which can potentially lead to decreased performance of the model.Chang et al. evaluated CWT for image classification in these independent image collections (retinal fundus photos, mammography, and ImageNet) [14].Although several research groups have reported the application of distributed learning to medical image processing tasks, to the best of our knowledge, there has been no report on the application of distributed learning to CADe software for head magnetic resonance (MR) images.
In this study, we investigated the feasibility of distributed learning for the development of lesion detection software for head MR images using multiple datasets with different scanner vendors and static magnetic field strengths.We targeted two types of CADe software for cerebral aneurysm detection in MRA images and brain metastasis detection in contrastenhanced T1-weighted MR images.

Materials and methods
We targeted two types of CADe software for cerebral aneurysm detection in MRA images and brain metastasis detection in contrast-enhanced T1-weighted MR images.Based on previous studies, we used different model architectures in each CADe software to validate the feasibility of distributed learning across multiple model architectures.Two distributed learning strategies (FL and CWT) were evaluated by comparison with the current CADe software development strategy (Centralized).

Cerebral aneurysm dataset
This study was approved by the ethical review boards of our institutions.We collected a total of 315 MRA images from three in-hospital datasets.The details of the datasets are shown in Table 1. Figure 3 shows examples of cerebral aneurysm datasets.Each case included at least one aneurysm of 2 mm or more in diameter, which at least two experienced radiologists determined.For each aneurysm in Datasets A and C, one of two board-certified radiologists (N.H. and T.Y. with 32 and 26 years of experience in MRA interpretation, respectively) defined its area by pixel-wise painting.For each aneurysm in Dataset B, two board-certified radiologists (N.H. and S.M with 14 years of experience in MRA interpretation) defined its area via the spherical region of interests (ROIs) to encompass the entire aneurysms in 3D.Discrepancies between the two radiologists were resolved by a third board-certificated radiologist (T.Y.).These annotations were carried out ixel-wise painting using the web-based image database system CIRCUS DB [18].Among the 105 cases for each dataset, 75 were used for training, whereas the remaining cases were equally assigned to validation and test sets.The details of the populations of the cerebral aneurysm dataset are shown in Table 2.

Brain metastasis dataset
We selected 347 brain contrast-enhanced T1-weighted MR images from 75 patients in the open database of brain tumors for studies in artificial intelligence (OpenBTAI) brain metastasis dataset [19] that satisfied two conditions: (1) manually defined voxel-wise labels of metastasis stored in The Neuroimaging Informatics Technology Initiative (NIfTI) format are available at the Figshare Repository [20], and (2) in patients whose images were acquired using scanners from multiple vendors, only the images from the vendor with the most number of series are used (Fig. 4).The details of the datasets are presented in Tables 3 and 4. Figure 5 shows an example of the brain metastasis dataset.

Cerebral aneurysm detection in MRA images
The input MRA images were first resampled to a 0.3 mm isotropic voxel size using tricubic interpolation, and the signal intensity distributions of images were standardized by the global piecewise linear mapping [21].After that, we extracted arterial voxels from the MRA images by the region-growing-based method.We trained the 3D U-Net [22] model (Fig. 6) using each pair of the original MRA image and the area of aneurysms.We used the stochastic gradient descent (SGD) to optimize the network weights.We used group normalization [23] with eight groups in the convolution block.For each aneurysm, a 48 × 48 × 48 cubic volume of interest (VOI) around the center of gravity of the aneurysm region was extracted.In addition, four augmented VOIs are generated by random shifts within ± 24 voxels on the x-, y-, and z-axes and random rotation (0°/90°/180°/270°) in each of the axial, coronal, and sagittal planes.The augmented VOIs were changed for each epoch.In the detection phase, 48 × 48 × 48 cubic VOIs were extracted at lattice points (whose intervals are 24 voxels in each direction) in   After that, the lesion candidates are extracted by binarizing the combined output and the following connected component analysis.

Brain metastasis in contrast-enhanced MR images
Brain contrast-enhanced T1-weighted MR images were preprocessed using the following pipeline: (a) skull stripping employing the FSL Brain Extraction Tool (BET) ver.6.0.6.4 (FMRIB Centre, Oxford University, Oxford, UK) [24], (b) Gaussian intensity normalization [25,26], wherein voxel values were divided by the standard deviation of the voxel values for each image, (c) resampling to an isotropic voxel size of 1.0 mm using tricubic interpolation, and (d) padding or cropping the size of the axial section (x-and y-axes) to 256 × 256.We used the single-shot detector (SSD) [27]based model proposed by Zhou et al. (Fig. 7) [28], which was trained to map axial MR slices to a set of bounding box predictions encompassing brain metastasis and associated detection confidences.We implemented the model by using pytorch-ssd [29].The ground-truth bounding boxes for each slice of the training data were defined from the voxel-wise label data of the brain metastasis dataset, and the bounding boxes with a height or width of two voxels or less were excluded (Fig. 5c).Table 5 shows the number of axial slices used for training from each dataset.Each slice included at least one bounding box.In the training phase, we assigned a binary class label to each anchor or default box.The positive (brain metastasis) and negative (background) anchors were selected on the basis of the intersection over union (IoU) with the ground-truth bounding boxes.Anchors with an IoU of 0.2 or greater were labeled positive, whereas anchors with an IoU of less than 0.2 were labeled negative.We used the SGD to optimize the network weights.In addition, two augmented slices were generated by random expansion, random horizontal flip with a 50% probability, and random rotation (0°/90°/180°/270°).Other parameters for training the model were set in accordance with a study by Zhou et al. [28].In the detection phase, the trained model was applied to each axial slice, and the output bounding boxes from adjacent outputs were stacked to construct a lesion candidate.

Cerebral aneurysm detection
We employed 50 trials of hyperparameter tuning with random search [31] and utilized the area under the curve (AUC) value of the free-response receiver operating characteristic (FROC) curve, with the upper limit of two false positives (FPs) per case, as an evaluation criterion.The tuned hyperparameters were the depth of 3D U-Net (3/4/5), the number  ).The number of maximum epochs was set to 100.In the performance evaluation, if the lesion candidate met at least one of the following criteria, it was judged as a true positive (TP), which were the same criteria used in a study by Nomura et al. [11].The judgments were processed automatically.
1.The distance between the center of gravity of the lesion candidate and the center of gravity or location of lesion center of any aneurysm was less than 3.0 mm. 2. The distance between the center of gravity of the lesion candidate and the center of gravity or location of lesion center of any aneurysm was less than the radius of the aneurysm.

Brain metastasis detection
We employed 30 trials of hyperparameter tuning with random search and utilized the AUC value of the FROC curve, with the upper limit of 10 FPs per case, as an evaluation criterion.The tuned hyperparameters were the batch size (4-64, step: 2), and the learning rate of the SGD (10 -4 -10 -2 ).
The number of maximum epochs was set to 100.In the performance evaluation, if the lesion candidate contains at least one voxel of the label of brain metastasis, it was judged as a TP, which was equivalent to the same criterion used in a study by Zhou et al. [28].

Training of CADe software using distributed learning
We compared the following three training strategies for each CADe software: Centralized (baseline) FL [13] CWT [14].
In FL and CWT, parameters obtained through hyperparameter tuning in Centralized were used, excluding the parameters described later.To facilitate comparison between the FROC curves from different strategies in a single number, the competition performance metric (CPM) [32], which defines the average sensitivity at a predefined number of FPs/case (1/8, 1/4, 1/2, 1, 2, 4, and 8 FPs/case) along an FROC curve, was employed as the evaluation criterion.The bootstrap method [33] was applied to test the statistical significance of differences in CPM between distributed learning and Centralized.The images were sampled 10,000 times, with each bootstrap containing the same number of images as in the original set.At each bootstrap iteration, the FROC curves were recalculated for each training strategy, and the ΔCPM was evaluated between each distributed learning and Centralized.P values were calculated as the fraction of evaluated metric populations that were negative or zero, corresponding to cases where the distributed learning did not outperform Centralized under comparison (null hypothesis).Differences in performance were considered statistically significant at p < 0.05.Statistical analyses were conducted using Python 3.8.10,along with the NumPy (version 1.25.2) and SciPy (version 1.11.1)libraries.

Federated learning (FL)
The framework of FL is a server-institution architecture, which consists of one server and k institutions, as shown in Fig. 1.In FL, a global model is sent from the server to each institution to train using its local data.The server combines model updates to generate a new global model that is sent back to the local institution for further training.This flow is defined as one round.We used the federated averaging (FedAvg) [13] generalized approach to update the global model.The number of epochs per round for FL was set to E FL = {1/2/4}, and the maximum number of rounds was set to ⌈100/E FL ⌉ rounds.Here, ⌈x⌉ denotes the smallest integer greater than or equal to x.

Cyclical weight transfer (CWT)
In the CWT framework, as shown in Fig. 2, there is no server and each institution conducts training using its own data.First, the order of institutions for training was determined and an initial model was provided for the first institution.In each institution, the model received from the previous institution is trained using its own data, and the trained model is then sent to the next institution.This process is cyclically repeated among all institutions until the termination criteria are met.The number of epochs per institution for CWT was set to E CWT = {1/2/4}, and the maximum number of epochs was set to 100.

Results
Figure 8 shows the FROC curves for each training strategy in cerebral aneurysm detection, and Table 6 shows the CPM scores in cerebral aneurysm detection.From Table 6, the CPM scores were 0.462 ± 0.058 for Centralized, 0.434 ± 0.059 (p = 0.677) for FL of 1 epoch per round, and 0.506 ± 0.061 (p = 0.200) for CWT of 2 epochs per institution.However, there was no significant difference in the bootstrap method results.
Figure 9 shows the FROC curves for each training strategy in brain metastasis detection, and Table 7 shows the CPM scores in brain metastasis detection.From Table 7, the CPM scores were 0.602 ± 0.064 for Centralized, 0.654 ± 0.063 (p = 0.098) for FL of 1 epoch per round, and 0.603 ± 0.069 (p = 0.479) for CWT of 1 epoch per institution.However, there was no significant difference in the bootstrap method results.

Discussion
We experimentally showed that the development of CADe software for head MR images utilizing data collected from multiple institutions, scanner vendors, and magnetic field strengths can be achieved through distributed learning.
Unlike the centralized strategy, distributed learning does not require data sharing among institutions.In addition, the performance of CADe software trained through distributed learning was found to be equivalent to or better than that trained through the centralized strategy.If computational environments capable of training machine learning models, including deep learning, can be provided at each institution, distributed learning can become one of the strategies for CADe software development.
The performance of distributed learning depends on the number of epochs per training at each institution, that is, the epoch(s) per institution in CWT or the epoch(s) per round in FL.In CWT, the highest performance for cerebral aneurysm detection was achieved with two epochs per institution, and that for brain metastasis detection was achieved with one epoch per institution.These results are consistent with those of Chang et al. [14], which showed a tendency for better performance with fewer epochs per institution in their study.In FL, increasing the number of epochs for each round leads to an increase in loss in the latter stages of training [34].In contrast, reducing the number of epochs per round increases the number of rounds required for loss convergence.Therefore, it is effective to decrease the number of epochs per round in the latter stages of training.Consequently, the number of epochs per training at each institution is one of the hyperparameters that should be adjusted in distributed learning.Recently, FL with hyperparameter optimization, which adjusts the learning rate and the number of epochs per round during the training, has been reported [34].It is expected to  improve the performance of CADe software when applying this method.
The distributed learning method that achieved the highest performance depends on the target CADe software.In CWT, updating the model parameters is performed only with data from the institution for training, which results in a tendency to depend on the last institution where training was performed.In contrast, FL is more stable than CWT, because FL updates the global model using updated information from all training institutions for each round [35].The factors that showed different trends for each CADe software in this study include differences in machine learning tasks, models, or tendencies in data variations among institutions.
A common variability that is likely to arise in distributed learning is the difference in the amount of training data among institutions.For example, a setting in which both small (e.g., clinics) and large (e.g., university hospitals) institutions can be considered.In FL, FedAvg, a common approach for updating the global model, tends to reflect updates to the model weights from institutions with larger amounts of data.To address this issue, improved methods for updating the global model, such as FedProx [36] and Scaffold [37], have been proposed.However, these methods require an appropriate setting of the optimization parameters and additional exchanges between the server and each institution.In CWT, the model is trained for a fixed number of epochs at each institution, which can lead to overfitting in institutions with less training data and underfitting in institutions with larger training data.Balachandar et al. reported that performance decreased owing to variations in the amount of training data across institutions in two types of tasks: diabetic retinopathy detection in retinal fundus photos and thoracic disease classification in chest radiographs [38].They also demonstrated that adjusting the number of epochs at each institution according to the amount of training data effectively mitigated performance degradation.Consequently, it is necessary to investigate the variation in the amount of training data in each institution in the training of CADe software.
To maximize the performance of CADe software at each institution, it is valuable not only to have models that maximize generalization through distributed learning, but also  to have models dedicated for each institution.The training strategies for models dedicated to each institution include training using the data only from its institution and the application of personalized FL [39].Personalized federated learning collaboratively trains models specific to individual institutions, and several studies have shown its applicability to medical image analysis [40][41][42][43].We plan to apply personalized FL to the training of CADe software.
In distributed learning, a machine learning model is shared between institutions instead of sharing training data, thereby protecting the privacy of the training data.However, model parameters exchanged among institutions conceal sensitive information that can be exploited in privacy attacks [44].For example, model inversion attack [45][46][47][48][49] infers personal information, including image data, from the model parameters.Therefore, to comply with strict data protection regulations such as the General Data Protection Regulation (GDPR) in EU/UK [50], it is also necessary to protect models [44].As a strategy to ensure the security of models in FL, the application of privacy-preserving techniques such as differential privacy [51,52] and homomorphic encryption [53] has been proposed [54][55][56].Differential privacy involves adding random noise to the model parameters in each institution to obscure certain sensitive attributes of the model before sending it to the server.Homomorphic encryption allows computations to be performed on encrypted data without needing a secret key to decrypt the ciphertext.In FL, each institution sends a model encrypted with homomorphic encryption to the server, and the server updates the global model using the encrypted models.We plan to investigate the application of these privacy-preserving techniques in the training of CADe software.
Our study has several limitations.First, we evaluated only two types of CADe software.The results may not be applicable to other lesions.Second, many hyperparameters used in distributed learning were obtained through hyperparameter tuning in Centralized.In particular, for brain aneurysm detection, the structure of the model is also included in the hyperparameters, and the optimal structure of the model may differ with distributed learning.Although FL combined with network structure search has also been proposed [57], further investigation is necessary to apply it to the model used in this study.Third, the amount of image data used with each CADe software was small.In machine learning, the impact of additional training data generally decreases as the amount of training data increases [58].Pati et al. reported the training of a deep learning model for detecting glioblastoma subcompartment boundaries using federated learning with 6314 multi-parametric MR images collected from 71 institutions across six continents [59].In the federated setting, the results indicated that the performance improvement was not directly or linearly related to the amount of training data.Therefore, further investigation of the relationship between the amount of training data and the performance in distributed learning is required.

Conclusion
We investigated the feasibility of distributed learning for the development of CADe software for head MR images.The results showed that distributed learning is feasible for training the model using data collected from multiple sites, scanner vendors, and magnetic field strengths.Our future works include validation using other types of CADe software, the application of personalized FL to maximize performance at each institution, and the introduction of privacy-preserving techniques such as differential privacy to enhance data security.

Fig. 3
Fig. 3 Example of cerebral aneurysms dataset.The original MRA images are shown on the left and the original MRA images with the gold standards are shown on the right.a 6 mm in anterior communicating artery (Dataset A, pixel-wise painting), b 5 mm in left internal carotid artery (Dataset B, spherical region)

Fig. 4
Fig. 4 Flowchart of image selection for the brain metastases dataset

2. 3
Hyperparameter tuning of CADe softwareWe used a conventional strategy that trains the model using data collected from each institution (Centralized) as a baseline.The hyperparameter used for training each CADe software was determined by hyperparameter tuning in Centralized.The model of each lesion detection software was implemented using Python 3.8.10 and PyTorch 1.13.0[30].These processes were performed on a workstation consisting of an Intel Xeon Gold 6230 2.1 GHz 20-core processor with 384 GByte RAM, NVIDIA GeForce RTX8000 GPU, Ubuntu 20.04.5, NVIDIA Driver 530.41.03,CUDA 11.8.0, and cuDNN 8.7.0.We used a single GPU for the training of the models.

Fig. 5
Fig. 5 Example of brain metastasis dataset.a Original contrast-enhanced MR image (metastasis of 20.5 mm in diameter), b voxel-wise labeled image (red region) overlaid on the original MR image, and c bounding box (red rectangle) defined from voxel-wise labeled image

Fig. 6 Fig. 7
Fig. 6 3D U-Net model used in cerebral aneurysm detection.a Overall step of 3D U-Net and b convolutional block: f 1st , number of filters in the first layer; GroupNorm, group normalization with eight groups; ReLU, rectified linear unit

Fig. 8
Fig. 8 Average FROC curves obtained from 1000 bootstrap iterations for each training strategy in cerebral aneurysm detection.Confidence bands (semi-transparent) indicate 95% confidence intervals along the sensitivity axis

Fig. 9
Fig. 9 Average FROC curves obtained from 1000 bootstrap iterations for each training strategy in brain metastasis detection.Confidence bands (semi-transparent) indicate 95% confidence intervals along the sensitivity axis

Table 1
Specifications of the cerebral aneurysm dataset

Table 3
Specifications of the brain metastasis dataset

Table 4
Populations of the brain metastases dataset Continuous variables are shown as the mean ± standard deviation.Categorical variables are shown as the number and percentage (in parentheses).Metastasis size is defined as maximum diameter of the largest region in the axial section

Table 5
Number of axial slices used for training of brain metastasis detection

Table 6
CPM score for each training strategy in cerebral aneurysm detection

Table 7
CPM score for each training strategy in brain metastasis detection