Abstract
Purpose
This paper considers a new problem setting for multi-organ segmentation based on the following observations. In reality, (1) collecting a large-scale dataset from various institutes is usually impeded due to privacy issues; (2) many images are not labeled since the slice-by-slice annotation is costly; and (3) datasets may exhibit inconsistent, partial annotations across different institutes. Learning a federated model from these distributed, partially labeled, and unlabeled samples is an unexplored problem.
Methods
To simulate this multi-organ segmentation problem, several distributed clients and a central server are maintained. The central server coordinates with clients to learn a global model using distributed private datasets, which comprise a small part of partially labeled images and a large part of unlabeled images. To address this problem, a practical framework that unifies partially supervised learning (PSL), semi-supervised learning (SSL), and federated learning (FL) paradigms with PSL, SSL, and FL modules is proposed. The PSL module manages to learn from partially labeled samples. The SSL module extracts valuable information from unlabeled data. Besides, the FL module aggregates local information from distributed clients to generate a global statistical model. With the collaboration of three modules, the presented scheme could take advantage of these distributed imperfect datasets to train a generalizable model.
Results
The proposed method was extensively evaluated with multiple abdominal CT datasets, achieving an average result of 84.83% in Dice and 41.62 mm in 95HD for multi-organ (liver, spleen, and stomach) segmentation. Moreover, its efficacy in transfer learning further demonstrated its good generalization ability for downstream segmentation tasks.
Conclusion
This study considers a novel problem of multi-organ segmentation, which aims to develop a generalizable model using distributed, partially labeled, and unlabeled CT images. A practical framework is presented, which, through extensive validation, has proved to be an effective solution, demonstrating strong potential in addressing this challenging problem.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Accurate and robust multi-organ segmentation is highly required in computer-aided diagnosis, and its successive breakthroughs have been witnessed with the application of deep learning [1, 2]. To apply deep learning to multi-organ segmentation, one can collect a large-scale dataset with dense annotations from multiple institutes to train a generalizable model [3]. However, realizing such an application is usually restricted in clinical practice. On one aspect, medical datasets cannot be easily shared among medical institutes or clients due to privacy-keeping regulations. In addition, annotations for multi-organ can be incomplete and inconsistent across institutes. For instance, institutes may annotate single or partial organs that do not overlap with others due to different research interests. Another observation in practice is that institutes may leave many images unlabeled since annotating dense labels is costly. Thus, datasets for multi-organ segmentation are usually distributed since they cannot be shared and centralized. They are also imperfect since they do not have full annotations of multiple organs for fully supervised model training.
Driven by these observations, this work considers the problem of using distributed, partially labeled, and unlabeled samples to train a federated model for multi-organ segmentation. To the best of our knowledge, this problem remains unexplored. Three subproblems should be well addressed for this challenge: (1) learning from partially labeled samples, (2) learning from unlabeled samples, and (3) learning from distributed samples from multi-institute. To this end, this paper proposes FPS-Seg, a practical framework incorporating Federated learning (FL), Partially supervised learning (PSL), and Semi-supervised learning (SSL) modules for multi-organ Segmentation. Briefly, FPS-Seg maintains one central server and several clients. Clients locally train in-house models with partially labeled and unlabeled samples with PSL and SSL modules. The FL module bridges client and central server communication to prepare a global statistical model. With the collaboration of three modules, valuable information can be mined from imperfect local datasets and aggregated to develop a generalizable model. Contributions of this work are summarized in the following.
-
A new problem setting, i.e., learning a model from decentralized, partially labeled, and unlabeled samples, is introduced for multi-organ segmentation, which is tougher and closer to clinical practice.
-
A practical framework is designed to address this problem by unifying federated, partially supervised, and semi-supervised learning.
-
The proposed method is extensively validated with several CT datasets. It shows a promising solution to this challenging problem. It also has a good generalization ability for downstream segmentation tasks.
Related works
Multi-organ segmentation remains a challenging task whose objective is to concurrently delineate multiple organs or anatomical structures from medical images, e.g., abdominal CT scans. Comprehensive insights into the domain can be gathered from dedicated reviews [1, 2].
The essence of semi-supervised learning (SSL) [4,5,6] is leveraging a small amount of labeled data alongside a much larger set of unlabeled data to train a model. Consistency learning [4, 5], expecting prediction invariance under perturbations, and pseudo-labeling [6], utilizing pseudo-labels for self-training, are two main strategies in SSL. Given the labor-intensive and costly nature of manual annotations in medical image analysis, SSL offers a viable alternative by tapping into the more accessible pool of unlabeled data. Several SSL methods [7, 8] have already been proposed for multi-organ segmentation.
Another key observation in practice is the substantial presence of datasets with only one or a few organs labeled in abdominal CT scans. To use these datasets with inconsistent and partially labeled annotations, a practical paradigm called partially supervised learning (PSL) has been introduced [9, 10]. While PSL is synonymous with SSL in some machine learning contexts [11], to clarify, this paper distinguishes between the two paradigms following prior works [9, 10]. In this work, SSL and PSL cater to different scenarios. SSL uses a mix of labeled and unlabeled data, whereas PSL manages datasets where each sample possesses some labels but not a full set.
Federated learning (FL) represents an advanced approach for decentralized data training, which is especially beneficial for sensitive fields like medical imaging, where datasets cannot be easily shared due to data privacy and regulations [12,13,14]. Several methods leveraging FL for multi-organ segmentation have been proposed. Notably, studies like [15, 16] have endeavored to train models on decentralized datasets with only partial annotations, combining PSL and FL for multi-organ segmentation.
Although great progress has been achieved by existing methods for multi-organ segmentation and other tasks in medical image analysis, these methods are primarily for a single task, i.e., SSL [7, 8], PSL [9, 10], and FL [13, 14], and dual tasks, e.g., federated semi-supervised learning [17, 18] and federated partial-label learning [15, 16]. Unlike previous works, this work introduces a more challenging and practical setting in multi-organ segmentation, which aims to learn a federated model from distributed, partially labeled, and unlabeled datasets by unifying SSL, PSL, and FL.
Method
Problem definition
Ideally, training a generalizable model for segmenting m organs requires numerous images \(\textbf{X}\) and the corresponding full annotations \(\textbf{Y}\) spanning \(\left( m+1\right) \) classes, where \(\mathcal {M} = \left\{ 0,1,\ldots ,m\right\} \) denotes the class set with \(\left\{ 0\right\} \) for background and \(\left\{ 1\right\} \) to \(\left\{ m\right\} \) for organs.
However, in clinical settings, datasets are often decentralized, with partial or no annotations. Given K medical institutes \(\left\{ Z_{i}\right\} _{i=1}^{K}\), each holds a dataset \(\mathcal {D}_{i} = \left\{ \mathcal {D}_{i}^{u}, \mathcal {D}_{i}^{l}\right\} \), where \(\mathcal {D}_{i}^{u} = \left\{ \textbf{X}_{i}^{u}\right\} \) contains images \(\textbf{X}_{i}^{u}\) devoid of annotations and \(\mathcal {D}_{i}^{l} = \left\{ \textbf{X}_{i}^{l}, \textbf{Y}_{i}^{l}\right\} \) consists of images \(\textbf{X}_{i}^{l}\) and partial annotations \(\textbf{Y}_{i}^{l}\). This study considers an extreme case where each client only owns annotations for a single organ. Suppose that the label sets of \(\left\{ \textbf{Y}_{i}^{l}\right\} _{i=1}^{K}\) are defined as \(\left\{ \mathcal {E}_{i}\right\} _{i=1}^{K}\), then \(\mathcal {E}_{1} \cap \mathcal {E}_{2} \cap \mathcal {E}_{3} \cap \cdots \cap \mathcal {E}_{K} = \left\{ 0\right\} \), and \(\mathcal {E}_{1} \cup \mathcal {E}_{2} \cup \mathcal {E}_{3} \cup \cdots \cup \mathcal {E}_{K} = \mathcal {M}\). These institutes are expected to utilize distributed, partially annotated, and unlabeled data to train a global model for multi-organ segmentation collaboratively.
Overview
The proposed framework FPS-Seg is shown in Fig. 1. FPS-Seg simulates a practice where a central server coordinates three medical institutes (\(K=3\)) to collaboratively train a global model for multi-organ (liver, spleen, and stomach) segmentation. Institutes maintain teacher models \(\left\{ T_{i}\right\} _{i=1}^{K}\) and student models \(\left\{ S_{i}\right\} _{i=1}^{K}\). The teacher models use exponential moving averaging (EMA) weights of the student models. During local training phase, on one aspect, the student models learn from partially labeled samples. Besides, consistency is enforced between the outputs of teacher and student models to take advantage of unlabeled samples. The central server aggregates local student model weights to update the global model G. The FL, PSL, and SSL modules are introduced below.
Federated learning module
The FL module builds the bridge between local clients and the global server. Namely, it offers global model weight aggregation and local model weight updating functions. Its role is to train a global model \(G(\cdot ; \mathbf {\Theta }^{g})\) until convergence with a total of R rounds without data sharing to violate data privacy regulations. During training, at the r-th federated round, each institute of \(\left\{ Z_{i}\right\} _{i=1}^{K}\) would download the current global weight \(\mathbf {\Theta }^{g}_{(r)}\) from the server and assign it to the local model \(S_{i} \left( \cdot ; \mathbf {\Theta }_{i}^{s} \right) \), which shares the same architecture as the global model. Afterward, clients fine tune local models for e epochs using their private datasets. The central server will then collect local weights \(\left\{ \mathbf {\Theta }_{i(r)}^{s}\right\} _{i=1}^{K}\) and aggregates them to get updated global model weights \(\mathbf {\Theta }^{g}_{(r+1)}\). This study adopts federated average algorithm [19] to update the global model:
where \(N_{i}\) denotes the number of images for each dataset \(\mathcal {D}_{i}\) of client \(Z_{i}\).
Partially supervised learning module
Assuming that the background, liver, spleen, and stomach class indexes are 0, 1, 2, and 3, the class set \(\mathcal {M}\) is \(\left\{ 0,1,2,3\right\} \), and institutes \(Z_{1}\), \(Z_{2}\), and \(Z_{3}\), respectively, hold private liver, spleen, and stomach datasets that comprise a large part of unlabeled samples and a small part of labeled samples. The label sets \(\mathcal {E}_{1}\), \(\mathcal {E}_{2}\), and \(\mathcal {E}_{3}\) are, respectively, \(\left\{ 0,1 \right\} \), \(\left\{ 0,2 \right\} \), and \(\left\{ 0,3 \right\} \). The PSL module enables each client of \(\left\{ Z_{i}\right\} _{i=1}^{K}\) to train its local model \(S_{i} \left( \cdot ; \mathbf {\Theta }_{i}^{s} \right) \) with partially labeled samples \(\mathcal {D}_{i}^{l} = \left\{ \textbf{X}_{i}^{l}, \textbf{Y}_{i}^{l}\right\} \).
Considering that a mini-batch of samples \(\left\{ \textbf{x}_{i}^{l}, \textbf{y}_{i}^{l}\right\} \) is fetched from \(\mathcal {D}_{i}^{l}\), in which \(\textbf{x}_{i}^{l} \in \mathbb {R}^{B \times C \times H \times W \times D}\) denotes 3D CT volumes, where B, C, H, W, and D, respectively, indicate the size of the batch, channel, height, width, and depth, C is 1 for 3D CT volumes, and \(\textbf{y}_{i}^{l} \in \mathbb {R}^{B \times 2 \times H \times W \times D}\) indicates corresponding partial annotations in one-hot formation for a specific organ. \(S_{i} \left( \cdot ; \mathbf {\Theta }_{i}^{s} \right) \) outputs probability maps \(\textbf{p}_{i}^{l} \in \mathbb {R}^{B \times 4 \times H \times W \times D}\) with the input of \(\textbf{x}_{i}^{l}\). The optimization objective for this module employs the marginal and exclusion losses as described in [10]. Please refer to [10] for more technical details. On one aspect, all unlabeled organs are treated as the background and merged into the original background, and a marginal loss \(\mathcal {L}_\text {marg}\) is then calculated. In addition, the natural organ exclusiveness is added as additional prior knowledge to introduce a penalization in the form of an exclusion loss \(\mathcal {L}_\text {excl}\).
The training procedure of client \(Z_{1}\), as depicted in Fig. 2, is taken as an example, and note that other clients train models in a similar principle. \(Z_{1}\) holds labeled samples \(\mathcal {D}_{1}^{l}\) with annotations of the liver. The output probability maps of \(S_{1} \left( \cdot ; \mathbf {\Theta }_{1}^{s} \right) \) are denoted as \(\textbf{p}_{1}^{l}\). Since the spleen and stomach are not labeled, their corresponding channels in \(\textbf{p}_{1}^{l}\) can be merged into the first channel, and the new probability maps \(\hat{\textbf{p}}_{1}^{l} \in \mathbb {R}^{B \times 2 \times H \times W \times D}\) are then obtained. \(\hat{\textbf{p}}_{1}^{l}\) and \(\textbf{y}_{1}^{l}\) have the same channels, and a marginal loss \(\mathcal {L}_\text {marg}\) can be calculated between them. Besides, the exclusive labels \(\hat{\textbf{y}}_{1}^{l} \in \mathbb {R}^{B \times 4 \times H \times W \times D}\) are created for \(\textbf{p}_{1}^{l}\) based on \(\textbf{y}_{1}^{l}\). Expressly, for voxels belonging to the liver region in \(\textbf{x}_{1}^{l}\), the corresponding label values in \(\hat{\textbf{y}}_{1}^{l}\) are set to \(\left[ 1,0,1,1 \right] \), while the remaining label values are set to \(\left[ 0,1,0,0 \right] \). An exclusion loss \(\mathcal {L}_\text {excl}\) is enforced between \(\textbf{p}_{1}^{l}\) and \(\hat{\textbf{y}}_{1}^{l}\) to reduce their intersection.
Generally, the training objective \(\mathcal {L}_\text {psl}\) for each client of \(\left\{ Z_{i}\right\} _{i=1}^{K}\) is:
where
and
where j is the channel index, V is the number of voxels in an image, and v is the voxel index. \(\alpha \) and \(\beta \) are hyperparameters. The combination of cross-entropy (CE) loss \(\mathcal {L}_\text {ce}\) and Dice loss \(\mathcal {L}_\text {dice}\) is adopted as the marginal loss \(\mathcal {L}_\text {marg}\), and the combination of exclusion CE loss \(\mathcal {L}_\text {ece}\) and exclusion Dice loss \(\mathcal {L}_\text {edice}\) is employed as the exclusion loss \(\mathcal {L}_\text {excl}\).
Semi-supervised learning module
The SSL module enables every client of \(\left\{ Z_{i}\right\} _{i=1}^{K}\) to further leverage its unlabeled samples \(\mathcal {D}_{i}^{u} = \left\{ \textbf{X}_{i}^{u}\right\} \). Inspired by the work of [4], another model \(T_{i}\left( \cdot ; \mathbf {\Theta }_{i}^{t} \right) \) is applied for each client of \(\left\{ Z_{i}\right\} _{i=1}^{K}\). \(T_{i}\left( \cdot ; \mathbf {\Theta }_{i}^{t} \right) \) and \(S_{i}\left( \cdot ; \mathbf {\Theta }_{i}^{s} \right) \) are regarded as the teacher and the student models. The teacher model shares the same architecture as the student model and uses the student model’s EMA weights. Consistency is imposed on their predictions for unlabeled data. Besides, input perturbation similar to the work [5] is further introduced since consistency regularization under harsher perturbations empirically benefits model generalization ability.
An illustration of the training procedure for the SSL module is shown in Fig. 3. Assuming that a mini-batch of unlabeled images \(\textbf{x}_{i}^{u}\) is fetched at each training iteration, these images are firstly fed into \(T_{i}\left( \cdot ; \mathbf {\Theta }_{i}^{t} \right) \) and \(S_{i}\left( \cdot ; \mathbf {\Theta }_{i}^{s} \right) \) to obtain probability maps \(\tilde{\textbf{p}}_{i}^{u} \in \mathbb {R}^{B \times 4 \times H \times W \times D}\) and \(\textbf{p}_{i}^{u} \in \mathbb {R}^{B \times 4 \times H \times W \times D}\). Same as Section “Partially supervised learning module,” the channels of unlabeled organs are then merged into the background for \(\tilde{\textbf{p}}_{i}^{u}\) and \(\textbf{p}_{i}^{u}\) to yield merged probability maps \(\tilde{\textbf{q}}_{i}^{u} \in \mathbb {R}^{B \times 2 \times H \times W \times D}\) and \(\textbf{q}_{i}^{u} \in \mathbb {R}^{B \times 2 \times H \times W \times D}\). Consistency learning regards \(\tilde{\textbf{q}}_{i}^{u}\) as pseudo-targets and calculates an unsupervised loss \(\mathcal {L}_\text {unsup}\) between \(\tilde{\textbf{q}}_{i}^{u}\) and \(\textbf{q}_{i}^{u}\).
However, \(\tilde{\textbf{q}}_{i}^{u}\) may inevitably contain fault and noisy predictions, and consistency regulation based on which may accumulate training errors and result in model performance degradation. Confidence thresholding [5, 20], which involves setting a threshold \(\tau \), offers a practical solution to stabilize training and enhance model performance. It allows for the extraction of confident predictions from \(\tilde{\textbf{q}}_{i}^{u}\), enabling consistency regularization to rely solely on these predictions. By incorporating confidence thresholding, the training object on unlabeled samples \(\mathcal {D}_{i}^{u}\) for each client of \(\left\{ Z_{i}\right\} _{i=1}^{K}\) is:
where
in which \(\left\| \cdot \right\| ^{2}\) is the mean error function (MSE) and \(\mathbf {\Gamma }_{i} \in \mathbb {R}^{B \times H \times W \times D}\) denotes the binary masks that control consistency regularization only using confident predictions. j is the channel index, V is the number of voxels in an image, and v is the voxel index. The threshold \(\tau \) determines the extent of filtering, where \(\tau =0\) means all pseudo-target regions are included in the loss calculation, and \(\tau =1\) implies complete exclusion of pseudo-target regions.
Full training procedure of FPS-Seg
This part summarizes the full training procedure of FPS-Seg. At each federated round, each client first downloads the global model weight \(\mathbf {\Theta }^{g}\) and assigns it to student models \(\left\{ S_{i} \left( \cdot ; \mathbf {\Theta }_{i}^{s} \right) \right\} _{i=1}^{K}\). During the local training phase, student models \(\left\{ S_{i} \left( \cdot ; \mathbf {\Theta }_{i}^{s} \right) \right\} _{i=1}^{K}\) learn from labeled samples \(\left\{ \mathcal {D}_{i}^{l} \right\} _{i=1}^{K}\) with \(\mathcal {L}_\text {psl}\), and extract information learns from unlabeled samples \(\left\{ \mathcal {D}_{i}^{u} \right\} _{i=1}^{K}\) with the help of \(\left\{ T_{i} \left( \cdot ; \mathbf {\Theta }_{i}^{t} \right) \right\} _{i=1}^{K}\) using \(\mathcal {L}_\text {unsup}\). Thus, the total local training objective \(\mathcal {L}_\text {total}\) for each client of \(\left\{ Z_{i}\right\} _{i=1}^{K}\) is:
in which \(\gamma \) is a trade-off hyperparameter. When the local training finishes, the central server will aggregate local student weights \(\left\{ \mathbf {\Theta }_{i}^{s}\right\} _{i=1}^{K}\) to update the global model \(G(\cdot ; \mathbf {\Theta }^{g})\) with Eq. (1). A global model can finally be obtained by repeating the above procedures.
Experiments and results
Experimental settings
Datasets and evaluation metrics
Datasets Three in-house contrast-enhanced abdominal CT datasets: #Set-A, #Set-B, and #Set-C, were applied. FPS-Seg was first evaluated with #Set-A, and its generalization ability was then validated by transferring it to downstream tasks on #Set-B and #Set-C. Details of three datasets are shown in Table 1. For data preprocessing, all volumes were resampled to an isotropic spatial resolution of 1.0 mm for each axis. The intensities were truncated to the range of \(\left[ -1000, 1000 \right] \) Hounsfield units (HU) and then normalized as zero mean and unit variance.
Evaluation metrics The Dice score [%] and 95% Hausdorff distance (95HD) [mm] were applied as evaluation metrics. The 95HD is a specific instance of the partial HD [21]. Given a surface point set \(\mathcal {A}\) of the prediction and a surface point set \(\mathcal {B}\) of the ground truth, the sets of directed HD from \(\mathcal {A}\) to \(\mathcal {B}\) and \(\mathcal {B}\) to \(\mathcal {A}\) are defined as
and
respectively, where \(\left\| \cdot \right\| \) denotes the Euclidean norm. The values \(\omega _{\kappa } (\mathcal {A}, \mathcal {B})\) and \(\omega _{\kappa } (\mathcal {B}, \mathcal {A})\) that rank in the \(\kappa \)-th percentile of \(\omega (\mathcal {A}, \mathcal {B})\) and \(\omega (\mathcal {B}, \mathcal {A})\) can then be chosen to calculate the partial HD \(\Omega _{\kappa }(\mathcal {A}, \mathcal {B})\) with
\(\kappa \) is set as 95 to compute the 95HD.
Implementation details
Problem simulation One central server and three clients were maintained to simulate the problem. All experiments were conducted with fourfold cross-validation. #Set-A was split into 150/50 for training/validation at each fold. These 150 volumes were split into three sub-datasets (50/50/50) for three clients, and every sub-dataset was divided into 20 labeled samples and 30 unlabeled samples. Each sub-dataset only used annotations of a single organ.
Experimental setup All experiments were performed on the PyTorch platform. 3D U-Net [22] was chosen as the backbone. An SGD optimizer with a momentum of 0.9 and a weight decay of \(10^{-4}\) was utilized to train the global model for 600 federated rounds. The local training epoch e was set to 1. A warm-up two-stage training strategy was adopted. Specifically, clients trained models using labeled samples under a poly-learning rate with an initial learning rate of \(10^{-2}\) at the first 300 rounds and trained models using both labeled and unlabeled samples under a poly-learning rate with an initial learning rate of \(10^{-3}\) at the second 300 rounds. Sub-volumes with the size of \(256 \times 256 \times 112\) were randomly cropped for training. Random flipping and random rotation were applied as augmentation schemes. For hyperparameter settings, please refer to Section “Ablation studies.” During the testing phase, a sliding window strategy was applied.
Experiment results
Quantitative results
Table 2 shows the quantitative results of FPS-Seg with fourfold cross-validation. Table 3 provides a detailed quantitative validation of different methods in localized, centralized, and federated learning scenarios. In the localized learning scenario, each client trained its local model on its private data with single organ annotations. Centralized learning involved training FPS-Seg using centralized datasets, employing PSL and SSL modules while excluding the FL module. The multitask federated learning (MTFL) approach [16] was implemented for comparison in the FL mode. These evaluations were conducted under three data scenarios: 50 L (50 labeled samples), 20 L (20 labeled samples), and 20 L + 30 U (20 labeled and 30 unlabeled samples).
Each method demonstrated its upper bound accuracy with 50 L. The performance of each method obtained with 20 L + 30 U consistently surpassed that with 20 L, validating the efficacy of SSL in utilizing unlabeled data. FPS-Seg consistently improved over localized learning, indicating its capability to exploit local datasets through FL. Additionally, FPS-Seg outperformed MTFL [16] in the FL mode and yielded competitive results comparable to its performance in the centralized learning mode.
Ablation studies
Effects of FPS-Seg’s components and their hyperparameters \(\alpha \) and \(\beta \) in Eq. (2) are associated with the PSL module, \(\gamma \) in Eq. (7) relates to the SSL module, and \(\tau \) in Eq. (6) is for confidence thresholding. This ablation study was divided into three sub-steps. Initially, the roles of \(\alpha \) and \(\beta \) were examined with the PSL module using only labeled data. Once optimal values for \(\alpha \) and \(\beta \) were established, the model incorporated unlabeled data by enabling the SSL module with varying \(\gamma \). After determining suitable values for \(\alpha \), \(\beta \), and \(\gamma \), the model applied confidence thresholding with varying \(\tau \). This searching process allowed for an assessment of the individual contributions of each component, as well as evaluating the corresponding hyperparameters. Results are shown in Table 4. This paper set \(\alpha =4\), \(\beta =1\), \(\gamma =1\), and \(\tau =0\), under which FPS-Seg achieved superior performance.
Evaluating aggregation of student versus teacher models As depicted in Fig. 4, aggregating student models outperformed using teacher models. While teacher models maintain the EMA weights of the student models, this finding suggests that student models, which undergo direct gradient descent, are more effective for global model updating in our study.
Qualitative results
Three distinct instances with outliers are visualized in Fig. 5. These instances are notable for achieving satisfactory Dice scores yet exhibiting large 95HD values for the liver, spleen, and stomach, respectively, as indicated by red arrows. This visualization underscores the challenge FPS-Seg faces in certain instances where segmentation results contain outliers.
The qualitative results of various methods are displayed in Fig. 6. With 20 L, FPS-Seg outperformed localized training, surpassed the FL method MTFL [16], and showed results on par with centralized learning. Moreover, incorporating 30 U into the training further enhanced FPS-Seg’s performance.
Transfer to downstream tasks
Initially pretrained on #Set-A, FPS-Seg was transferred to pancreas and artery segmentation on #Set-B and #Set-C, respectively. The datasets were divided into 60/20 for training/validation for both #Set-B and #Set-C. Comparisons were conducted among 3D U-Net trained from scratch, initialized with weights obtained by pretraining on single organs such as the liver, spleen, and stomach, and initialized with pretrained FPS-Seg. These comparisons were drawn throughout 300 epochs until convergence was reached. As depicted in Fig. 7, models initialized with pretrained FPS-Seg exhibited faster convergence and superior validation performance compared to those trained from scratch and those pretrained on single organs across the two downstream tasks.
Discussion and conclusion
This paper introduced a challenging multi-organ segmentation problem, which was considered based on the following observations in reality: (1) datasets cannot be easily shared, and thus, we cannot collect a large-scale dataset to train a generalizable model; (2) a large part of images is unlabeled across institutes since annotation is costly; and (3) only a small number of images may be partially labeled, and annotations are inconsistent across institutes due to different research targets. Training a generalizable model using these distributed, partially labeled, and unlabeled samples is highly required in clinical practice and remains unexplored.
A practical approach, FPS-Seg, was introduced to tackle this problem. FPS-Seg comprised three key modules: partially supervised learning, semi-supervised learning, and federated learning modules. These modules respectively, managed to learn from partially labeled, unlabeled, and distributed samples. This method was straightforward in addressing partially supervised, semi-supervised, and federated learning in a unified way. Extensive experiments were conducted to show FPS-Seg’s successful solution for this challenging problem and good generalization ability for downstream segmentation tasks.
The proposed method was evaluated with liver, spleen, and stomach segmentation in CT images. Extending this method to segment additional organs using various modalities is considered an avenue for future work.
References
Cerrolaza JJ, Picazo ML, Humbert L, Sato Y, Rueckert D, Ángel González Ballester M, Linguraru MG (2019) Computational anatomy for multi-organ analysis in medical imaging: a review. Med Image Anal 56:44–67
Fu Y, Lei Y, Wang T, Curran WJ, Liu T, Yang X (2021) A review of deep learning based methods for medical image multi-organ segmentation. Phys Med 85:107–122
Ji Y, Bai H, GE C, Yang J, Zhu Y, Zhang R, Li Z, Zhanng L, Ma W, Wan X, Luo P (2022) AMOS: a large-scale abdominal multi-organ benchmark for versatile medical image segmentation. In: Advances in neural information processing systems
Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in neural information processing systems, vol 30
French G, Laine S, Aila T, Mackiewicz M, Finlayson G (2020) Semi-supervised semantic segmentation needs strong, varied perturbations. In: British machine vision conference
Zou Y, Zhang Z, Zhang H, Li C-L, Bian X, Huang J-B, Pfister T (2021) Pseudoseg: designing pseudo labels for semantic segmentation. In: International conference on learning representations
Zhou Y, Wang Y, Tang P, Bai S, Shen W, Fishman E, Yuille A (2019) Semi-supervised 3d abdominal multi-organ segmentation via deep multi-planar co-training. In: 2019 IEEE winter conference on applications of computer vision. IEEE, pp 121–140
Xia Y, Yang D, Yu Z, Liu F, Cai J, Yu L, Zhu Z, Xu D, Yuille A, Roth H (2020) Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation. Med Image Anal 65:101766
Zhou Y, Li Z, Bai S, Wang C, Chen X, Han M, Fishman E, Yuille AL (2019) Prior-aware neural network for partially-supervised multi-organ segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10672–10681
Shi G, Xiao L, Chen Y, Zhou SK (2021) Marginal loss and exclusion loss for partially supervised multi-organ segmentation. Med Image Anal 70:101979
Liu B (2007) Partially supervised learning. In: Web data mining: exploring hyperlinks, contents, and usage data. Springer, Berlin, pp 151–182
Li T, Sahu AK, Talwalkar A, Smith V (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Process Mag 37(3):50–60
Rieke N, Hancox J, Li W, Milletari F, Roth HR, Albarqouni S, Bakas S, Galtier MN, Landman BA, Maier-Hein K, Ourselin S, Sheller M, Summers RM, Trask A, Xu D, Baust M, Cardoso MJ (2020) The future of digital health with federated learning. NPJ Digit Med 3(1):1–7
Kaissis GA, Makowski MR, Rückert D, Braren RF (2020) Secure, privacy-preserving and federated machine learning in medical imaging. Nat Mach Intell 2(6):305–311
Xu X, Deng HH, Gateno J, Yan P (2023) Federated multi-organ segmentation with inconsistent labels. IEEE Trans Med Imaging 42(10):2948–2960
Shen C, Wang P, Yang D, Xu D, Oda M, Chen P-T, Liu K-L, Liao W-C, Fuh C-S, Mori K, Wang W, Roth HR (2022) Joint multi organ and tumor segmentation from partial labels using federated learning. In: Distributed, collaborative, and federated learning, and affordable AI and healthcare for resource diverse global health, pp 58–67
Yang D, Xu Z, Li W, Myronenko A, Roth HR, Harmon S, Xu S, Turkbey B, Turkbey E, Wang X, Zhu W, Carrafiello G, Patella F, Cariati M, Obinata H, Mori H, Tamura K, An P, Wood BJ, Xu D (2021) Federated semi-supervised learning for COVID region segmentation in chest CT using multi-national data from China, Italy, Japan. Med Image Anal 70:101992
Kassem H, Alapatt D, Mascagni P, Karargyris A, Padoy N (2022) Federated cycling (FedCy): semi-supervised federated learning of surgical phases. IEEE Trans Med Imaging
McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, pp 1273–1282
French G, Mackiewicz M, Fisher M (2018) Self-ensembling for visual domain adaptation. In: International conference on learning representations
Huttenlocher DP, Klanderman GA, Rucklidge WJ (1993) Comparing images using the Hausdorff distance. IEEE Trans Pattern Anal Mach Intell 15(9):850–863
Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O (2016) 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Medical image computing and computer-assisted intervention, LNCS, vol 9901, pp 424–432
Acknowledgements
This work was supported by the JSPS KAKENHI Grant Nos. 21K19898 and 17H00867, the JST CREST Grant No. JPMJCR20D5, and the JSPS Bilateral International Collaboration Grants, Japan.
Funding
Open Access funding provided by Nagoya University.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This study was approved by the institutional review boards of the Nagoya University and the Aichi Cancer Center Hospital.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zheng, Z., Hayashi, Y., Oda, M. et al. Federated 3D multi-organ segmentation with partially labeled and unlabeled data. Int J CARS (2024). https://doi.org/10.1007/s11548-024-03139-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11548-024-03139-6