Tackling the class imbalance problem of deep learning-based head and neck organ segmentation

Tappeiner, Elias; Welk, Martin; Schubert, Rainer

doi:10.1007/s11548-022-02649-5

Tackling the class imbalance problem of deep learning-based head and neck organ segmentation

Original Article
Open access
Published: 16 May 2022

Volume 17, pages 2103–2111, (2022)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Tackling the class imbalance problem of deep learning-based head and neck organ segmentation

Download PDF

2744 Accesses
9 Citations
1 Altmetric
Explore all metrics

This article has been updated

Abstract

Purpose

The segmentation of organs at risk (OAR) is a required precondition for the cancer treatment with image- guided radiation therapy. The automation of the segmentation task is therefore of high clinical relevance. Deep learning (DL)-based medical image segmentation is currently the most successful approach, but suffers from the over-presence of the background class and the anatomically given organ size difference, which is most severe in the head and neck (HAN) area.

Methods

To tackle the HAN area-specific class imbalance problem, we first optimize the patch size of the currently best performing general-purpose segmentation framework, the nnU-Net, based on the introduced class imbalance measurement, and second introduce the class adaptive Dice loss to further compensate for the highly imbalanced setting.

Results

Both the patch size and the loss function are parameters with direct influence on the class imbalance, and their optimization leads to a 3% increase in the Dice score and 22% reduction in the 95% Hausdorff distance compared to the baseline, finally reaching $0.8\pm 0.15$ and $3.17\pm 1.7$ mm for the segmentation of seven HAN organs using a single and simple neural network.

Conclusion

The patch size optimization and the class adaptive Dice loss are both simply integrable in current DL-based segmentation approaches and allow to increase the performance for class imbalance segmentation tasks.

FocusNet: Imbalanced Large and Small Organ Segmentation with an End-to-End Deep Neural Network for Head and Neck CT Images

A Deep-Learning Lesion Segmentation Model that Addresses Class Imbalance and Expected Low Probability Tissue Abnormalities in Pre and Postoperative Liver MRI

Multi-class Segmentation of Organ at Risk from Abdominal CT Images: A Deep Learning Approach

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Cancer is after cardiovascular diseases the second most common cause of death. Among the newly diagnosed cancer incidences, statistically 3% are tumors of the head and neck (HAN) region [1]. Due to the complex anatomy of the area, characterized by a large number of small soft tissue organs, image-guided radiotherapy is the primary choice of treatment for HAN cancer. The segmentation of the organs at risk (OAR) on the planning CT scans is necessary for the radiotherapy and the main reason of treatment delivery delays throughout the clinical pathway of the therapy. The segmentation is time-consuming, requires several highly educated medical experts and is still mainly performed manually; further observer variations are well documented [2]. Due to the time-consuming and subjective manual process, a field of research has developed around the automated segmentation of the HAN organs on medical images, with deep learning (DL) being the dominant and most successful learning- based approach [3]. Segmentation with DL can be interpreted as a voxel-wise classification problem using fully convolutional neural networks. The large difference in size of classes to be segmented can be defined as the class imbalance problem. Since the first introduction of a DL-based multi-organ HAN segmentation approach [4], it is known that the HAN area is specially affected by the class imbalance problem. In addition to the large difference in ratio of background and foreground voxels, the HAN area is characterized by large size differences between the foreground classes themselves, which is anatomically given through the differently sized organs to be segmented. As a result, the class imbalance causes a large performance difference in the segmentation of large and small organs [3].

In this work, we focus on the training window or patch size as the hyper-parameter with a direct influence on the class imbalance, as most segmentation networks are, due to GPU memory constraints, trained with randomly sampled patches of the original 3D image. Hence, we introduce a measurement for the class imbalance of differently sized training patches and optimize the patch size accordingly. Additionally, we adapt the classical multi-class Dice loss formulation which does not account for missing classes within patches. Our class adaptive Dice loss formulation is robust against missing classes, which is relevant for sparse class distributions within the image dataset and for the training with smaller patch sizes. We incorporate both the class imbalance optimized patches and the class adaptive Dice loss into the currently best performing general-purpose segmentation approach, the nnU-Net framework [5], and are able to increase the performance of its baseline version. The introduced multi-class confidence analysis following the work of Li et al. [6] also reveals an increased segmentation confidence for mid-sized organs due to the class label imbalance optimized patch size.

Related work

Guo et al. [7] and Gao et al. [8] were the first to specifically address and solve the class imbalance problem of the HAN area by using several different cascaded networks. The approaches are inspired by the work of clinical experts, first segmenting large and easy anchor organs and then zooming in to segment the harder small soft tissue organs. Similarly, the authors combined a strong large organ segmentation network, a small organ localization network and specific small organ segmentation networks effectively reducing the class imbalance of each network. In their follow-up work, the FocusNetv2, Gao et al. [9] further incorporated autoencoder-based shape priors [10] and adversarial training [11] into the small organ networks, achieving a Dice score (DSC) of 0.84 and a 95% Hausdorff distance (95HD) of 2.17 mm which are the currently best reported results on the MICCAI 2015 HAN segmentation challenge reference dataset [12]. An implicit reduction in the class imbalance, especially in favor of the small organs that are often visible in just a few CT slices, is recently achieved by hybrid networks using 2D convolutions and 3D convolutions in their architecture. Chen et al. [13] used 2D convolutions for the extraction of fine edges and 3D convolutions for coarse and fine semantic features in a single UNet [14]-based architecture. Tang et al. [15] extended a 2D UNet with an additional 3D convolution-based context-aware attention path and were able to achieve state of the art using a single HAN organ segmentation network.

Differently to architectural changes of the network, adapted cost functions can also reduce the class imbalance problem of DL. Roth et al. [16] presented the first DL-based multi-organ segmentation approach of the abdominal area and applied a class weighted cross-entropy (CE) loss function. The CE is an information theoretical measurement for probability distribution differences and allows to calculate the difference between the network’s voxel-wise class prediction and the ground truth. As the CE is the classical loss function for image classification, Milletari et al. [17] proposed the DSC as a volume-based overlap measurement to be used as a loss function for image segmentation. The Dice loss transforms the voxel-wise measurement into a semantic label overlap measurement and has become the state-of-the-art loss function of the field. Effectively reducing the number of measurements to the number of labels, the Dice loss also reduces the sensitivity of the loss regarding the class imbalance effect. However, the Dice loss is not able to eliminate the problem due to its intrinsic bias toward large volumes [2] as well as the remaining severe over-presence of the largest class during training. Consequently, Carole et al. [18] introduced the generalized Dice score (GDSC), which adaptively weights the DSC by the current class size. However, in a previous work [4] we showed that the GDSC introduces noise in the learning curve by the adaptive weights and missing classes in case of the common patch-based training setting. Zhu et al. [19] investigated different loss functions specifically for the imbalanced HAN area and showed the combination of the Dice loss and the focal loss [20] to outperform the plain Dice loss. Isensee et al. [5] proposed to combine the CE and dice loss to measure both the voxel-wise class predictions and the semantic label overlap and were able to show advancements in many different segmentation tasks using the combined loss function in their nnU-Net.

Another approach to analyze the class imbalance in neural networks for image segmentation is presented by Li et al. [6]. The authors found that the network output of under-represented classes tends to shift toward the decision boundary during test time, whereas well-represented classes are unaffected. As a result, the authors claim that an overfitting of the small-sized classes occurs during the training. For their analysis of the class imbalance-induced overfitting, the authors suggest to plot the logit output of the training data against the test data, which we adapt and confirm for our given multi-class setting.

Method

Dataset

For our study of the class imbalance problem in the HAN area, we utilize the MICCAI 2015 HAN auto segmentation challenge dataset [12]. The CT images of the dataset are from the 0522 multi-institutional clinical study of the Radiation Therapy Oncology Group [21], which made the data publicly available. The study contained multiple images of 111 patients with HAN cancer of the oropharynx, the hypopharynx or the larynx. The challenge dataset includes 40 patient CT scans, with manual reference segmentations of nine structures: the left and right Parotid Gland (PG), the left and right Submandibular Gland (SG), the Optic Chiasm (OC), the Brainstem (BS), the Mandible (MA) and the left and right Optic Nerves (ON). Although the original images of the 0522 study contained OAR reference segmentations for the radiotherapy planning, no standardized segmentation protocols existed at the time of the study and the segmented structures showed considerable differences in contouring. Accordingly, the nine organs for the dataset creation are iteratively re-contoured according to current scientific standard protocols until all segmentation experts agree and the observer bias is eliminated. For the scope of the challenge, 25 specific images identified by their file names are released as training images, 10 as an offsite test set and the last 5 as an additional test set for the onsite event of the challenge. In our work, we follow the challenge protocol regarding the dataset splits and combine the off- and onsite test images to one test set for our result presentation. The Submandibular Glands are not considered in our work as not all 40 CT scans contain the corresponding reference data.

Segmentation network design

Our work is based on the 3D nnU-Net framework of Isensee et al. [5]. The authors claimed and showed that a well-parameterized UNet [14] is hard to beat for any segmentation task and accordingly defined a set of well-proven fixed parameters and additional dataset-dependent rule-based parameters for a dynamically deep UNet. The fixed parameters are the learning rate, the optimizer, the data augmentation, the number of training iterations, the patch sampling strategy, the loss function, the inference using a sliding window approach and the post-processing as a largest component analysis. The most relevant dataset-dependent parameters are the spacing and the patch size further defining the UNet architecture. The spacing is evaluated as the median of the dataset in-plane spacing and the 10^th percentile of the out-plane spacing resulting in a spacing of $0.98\times 0.98\times 2.5$ mm. The patch size is initialized to the dataset median after resampling and iteratively enlarged, simultaneously with the depth of the UNet to fill the available GPU memory using a fixed batch size of two resulting in a patch size of $192\times 160\times 56$. The skeleton UNet is a basic UNet with two blocks of convolution, instance normalization [22] and nonlinearity in each resolution, starting with a channel size of 32, which is getting doubled (halved) with each downsampling (upsampling) operation. To inject gradients deeper in the network, deep supervision with auxiliary losses are used for the upsampling layer of the encoder. For further details regarding the original 3D version of the nnU-Net, we refer to the work of Isensee et al. [5].

Class imbalance measurement

As the currently most advanced general-purpose approach for medical image segmentation, we mainly follow the 3D nnU-Net framework, but adapt the loss function and also the patch size based on our class imbalance measurement as the parameters directly influencing the class imbalance while training. Figure 1 shows the average imbalance of the organ and background volume ratios of the dataset within a training epoch for different training strategies. For the ratio measurement, the dataset is rescaled following the spacing definition of the nnU-Net. Although the histograms visually show the difference of the organ volume ratios for the presented patch size strategies, we propose to use the standard deviation $\sigma $ of the class ratios as a single measurement for the class imbalance. The standard deviation of the averaged in-patch organ ratios is a single and easily interpretable value. The ratios sum up to one; accordingly, the standard deviation is the average distance to an ideally uniform distribution of in-patch organ ratios. Utilizing $\sigma $ as a cost function with the patch size as parameter allows us to find the training parameter with a minimal imbalance for the given dataset.

Class adaptive dice loss

The loss function proposed by the nnU-Net is the CE+Dice loss combining probabilistic voxel-wise class predictions and label overlap measurements, which is also advised by the currently largest study of loss functions for medical image segmentation by Ma et al. [23]. The CE loss is used in its basic multi-class formulation as:

$$\begin{aligned} \text {CE}(P,G) \; = \frac{1}{B} - \sum _{b,c,v} G_{bcv} \log (P_{bcv}) \;, \end{aligned}$$

(1)

with P and G being the one-hot-encoded prediction and ground truth volumes, consisting of B batches, C classes and V voxels. The multi-class Dice loss using $\epsilon $ as a small value for numeric stability is defined as:

$$\begin{aligned} \text {Dice}(P,G)=\frac{1}{BC}\sum _{b,c} \frac{2 \sum _{v}P_{bcv}G_{bcv}+\epsilon }{\sum _{v}P_{bcv}+\sum _{v}G_{bcv}+\epsilon } \; . \end{aligned}$$

(2)

The Dice loss formulation of the nnU-Net follows the batch Dice loss of Kodym et al. [24] with the adaptation of ignoring the background class. Contrary to the original Dice definition, Kodym et al. propose to evaluate the DSC with the batch as part of the volume instead of averaging the DSC over the batches. Accordingly, the Dice loss formulation used in the nnU-Net is given by:

$$\begin{aligned} \text {nnU-Dice}(P,G)=\frac{1}{C-1}\sum \limits _{c-1} \frac{2 \sum _{b,v}P_{bcv}G_{bcv}+\epsilon }{\sum _{b,v}P_{bcv}+\sum _{b,v}G_{bcv}+\epsilon } \;. \end{aligned}$$

(3)

However, due to the applied patch-based training, we propose to used the class adaptive Dice loss formulation in combination with the basic CE loss. We define the class adaptive Dice loss as:

$$\begin{aligned} \text {ca-Dice}(P,G)= & {} \frac{1}{N}\sum \limits _{b,c} \frac{2 \sum _{v}P_{bcv}G_{bcv}}{\sum _{v}P_{bcv}+\sum _{v}G_{bcv}+\epsilon },\nonumber \\ N= & {} \sum \limits _{b,c} {\left\{ \begin{array}{ll} 0,&{} \text {if} \sum _{v}G_{bcv}=0\\ 1,&{} \text {else} \end{array}\right. } \end{aligned}$$

(4)

Differently to the original Dice loss, our definition only involves the N classes present in the sampled patch and thus evaluates to the real DSC of the sampled patch instead of considering missing classes as perfectly segmented, which biases the loss toward incorrect scores.

Results

The nnU-Net as a general-purpose segmentation framework is based on a fixed and a dataset-dependent set of parameters. The patch size defining rule of the network is based on the assumption that large windows have a more global context and hence improve the segmentation result. However, using the standard deviation $\sigma $ of the organ volume ratios as a cost function to optimize the class imbalance within the patches results in smaller patch sizes than the global context maximizing patch size assumption of the nnU-Net. Our measurement of the class ratio standard deviation $\sigma $ naturally shows that the class imbalance is maximal ($\sigma =0.3301$) if a whole image approach is used and minimal if the patch size is minimal ($\sigma =0.27146$ for a patch size of $8\times 8\times 8$). Figure 1 shows the organ volume ratios, including the background of the sampling process using four different patch sizes.

In our experiments, to investigate the effect of the patch size and thus the class imbalance on the segmentation quality we use the suggested patch size of the nnU-Net framework and half of the patch size in-plane and a slightly reduced size out-plane to still give the network enough context in axial direction, resulting in the small patch size $96\times 80\times 48$ ($\sigma =0.32337$). We omit the full volume strategy presented in Fig. 1 as being infeasible due to its GPU memory demands as well as the minimal possible patch size only allowing a shallow U-Net with one downsampling (upsampling) layer. Additionally, we include our class adaptive Dice loss formulation (“Class adaptive dice loss” section) into the nnU-Net loss, as a robust cost function for the patch-based training of datasets with sparse class distributions as given in the HAN area. Consequently, we conduct experiments with the original nnU-Net parameters (large patch size $192\times 160\times 56$, nnU-Dice+CE loss) and our introduced class imbalance optimized patch size and the class adaptive loss function (ca-Dice).

Table 1 shows the average results of the networks trained on the MICCAI 2015 HAN challenge dataset [12], according to the challenge protocol. The results on the test data are evaluated using the DSC, the 95HD as well as the surface Dice (SD) as introduced by Nikolov et al. [2] combining a volume and a surface-based measurement (with surface tolerance $\tau $ identified by the authors in their observer agreement study). The bold values indicate the best results for the given measurement and values marked with stars significance (Wilcoxon signed rank test with $p < 0.05$) over the baseline. Following the work of Li et al. [6] in order to analyze a potential overfitting of the small organs we present in Fig. 2 a comparison of the output confidence distribution of the training and the test samples for the segmented organs of our experiments as violin plot. The values in each plot indicate the distance of the average confidence from the training to the test data.

Table 1 Segmentation results on the combined on- and off-site test data of the MICCAI 2015 HAN challenge dataset [12], for the evaluated configurations in terms of DSC, 95HD and surface Dice (SD)

Full size table

Implementation details

Our implementation is based on the Monai DynUNet pipeline module,^{Footnote 1} a reimplementation of the dynamic UNet used in the nnU-Net framework [5] and further adapted to follow the nnU-Net parameterization. Monai is a PyTorch-based framework for deep learning in healthcare imaging.^{Footnote 2} Our models are trained on Nvidia Titan RTX GPUs with 24 GB of memory for an average of 67 hours.

Discussion

The results of our experiments in Table 1 reveal that the presented extensions to the nnU-Net framework, the patch size adjustment especially in conjunction with the class adaptive Dice loss, are favorable for the present class imbalance in the HAN area.

Reducing the patch size directly influences the class imbalance within the sampled patches. The standard deviation, introduced as a measurement for the volume ratio imbalance within a training image patch, changes from $\sigma =0.32605$ to $\sigma =0.32337$ using the GPU memory optimized large patch size of $192\times 160\times 56$ compared to the suggested class imbalance optimized small patch size of $96\times 80\times 48$. As visible in Fig. 1, especially the ratio of the smaller classes increases within a patch. The improvement in the class imbalance therefore reduces the bias toward the large classes during the training and effectively results in an increase in performance of 2% in terms of the DSC and a significant increase of 2% regarding SD compared to the baseline nnU-Net framework. The 95HD is significantly reduced by 0.91 mm, yielding an improvement of 22% compared the baseline.

The utilization of the class adaptive Dice loss in the loss formulation of the nnU-Net improves the segmentation results regarding the DSC and significantly the SD by another 1%. The average of the 95HD is not improved as the Optic Chiasm is not segmented in one test sample; however, all other single organ measurements show improvements over the baseline. Contrary to the standard multi-class Dice loss formulation, the class adaptive Dice loss only evaluates the classes available within each patch, whereas the standard Dice loss calculates the average over all classes, distorting the average DSC depending on the current network prediction of the missing classes. The nnU-Dice which is based on the batch-Dice formulation [24], however, reduces the risk of missing classes by considering the batch dimension as part of the patch volume. The risk of missing classes within a patch depends on the volume size, the class distribution within the whole volume and, as adjustable training parameters, the patch size and the sampling strategy. As the nnU-Net framework uses a 33% random foreground oversampling strategy, the large patches and the batch-Dice formulation make the baseline nnU-Net already stable against missing classes. Nonetheless, we argue to use the class adaptive Dice, as it is robust against missing classes, especially if the patch size is smaller and the class distribution within the volume sparse. By showing significantly improved segmentation results for all measures, our experiments support the usage of a combined small patch size and the class adaptive Dice for imbalanced segmentation problems.

Deviating from the suggestion of the original work of Li et al. [6], Fig. 2 does not show the direct network output (the logits) of the segmented classes and its corresponding decision boundaries, which is only possible for up to three classes, but the confidence distribution after the softmax normalization of the eight HAN organs to be segmented. Although no decision boundary can be depicted for more than three classes, the presentation of the organ-wise normalized confidence values allows a direct comparison of the average confidence drift from train to test time and thus the identification of overfitting. The results in Fig. 2 confirm the findings of Li et al. [6] for the class imbalance HAN area and show that the small organs (the Optic Chiasm and the Optic Nerves) are subject to larger differences in training and test time confidence and accordingly prone to overfitting. The measurements also indicate the overall performance enhancement of the ca-Dice loss over the baseline, visible in the increased average confidence values, but do not show a reduction in the overfitting of the small organs by the loss function adaption. Contrary, the experiments with the small patch size optimized to reduce the class imbalance show a clear average confidence difference reduction in the Parotid Glands. The Parotid Glands can be considered as mid-sized organs, allowing the assumption that a further reduction in the class imbalance can reduce the confidence drift for the small organs too and hence increase their final segmentation results. The assumption is also supported by the constantly small average confidence drift of the Mandible and the Brainstem being the largest organs with the largest patch ratio and consequently the least overfitting.

Finally, in Table 2 we present the segmentation results combining the small patch size and the class adaptive Dice and chronologically compare them with the segmentation results of the most important works in the field also presenting their results on the MICCAI 2015 HAN challenge dataset. The table also indicates the number of organs and data samples used, as the original challenge protocol and its defined data splits are not followed in general.

Table 2 Average DSC and 95HD on the MICCAI HAN challenge dataset

Full size table

Conclusion

In summary, in this work we present an intuitive measurement for the organ volume ratio difference, which is a central problem appearing in the DL-based segmentation of the HAN area. Based on the measurement, we optimize the patch size parameter regarding the class imbalance for a single network-based HAN segmentation architecture. Additionally, we utilize the class adaptive Dice as a robust loss function for missing classes within a training patch. Both adaptions are incorporated in the nnU-Net framework where we are able to increase the segmentation results by an additional 3% in terms of the DSC and the SD and by 22% regarding the 95HD, resulting in an average DSC of $0.8\pm 0.15$ and a 95HD of $3.17\pm 1.7$ mm for the segmented HAN organs, respectively.

The patch size optimization and the class adaptive Dice loss can both easily be integrated into current DL-based segmentation approaches. In future work, we want to improve the state-of-the-art performance of the recently presented hybrid 2D-3D, single-network approach of Chen et al. [13] by integrating our adaptations. Single-network approaches are end-to-end trainable, less complex and therefore of higher practical interest compared to complex multi-network solutions. As an addition to the overfitting analysis, we like to investigate and combine asymmetric loss functions terms, proposed by Li et al. [6] with our ca-Dice loss to increase the distance to the decision boundaries of the small classes to further increase their test time performance.

Availability

The anonymized MICCAI 2015 HAN challenge dataset is publicly available (http://www.imagenglab.com/newsite/pddca/). The code of the work is available on Github (https://github.com/elitap/classimbalance).

Change history

03 September 2022
Missing Open Access funding information has been added in the Funding Note

Notes

https://github.com/Project-MONAI/tutorials/ (accessed 21-12-21).
https://monai.io/ (accessed 21-12-21).

References

Siegel RL, Miller KD, Fuchs HE, Jemal A (2021) Cancer statistics 2021. CA Cancer J Clin 71(1):7–33. https://doi.org/10.3322/caac.21660
Article PubMed Google Scholar
Nikolov S, Blackwell S, Zverovitch A, Mendes R, Livne M, De Fauw J, Patel Y, Meyer C, Askham H, Romera-Paredes B, Kelly C, Karthikesalingam A, Chu C, Carnell D, Boon C, D’Souza D, Moinuddin SA, Consortium DR, Montgomery H, Rees G, Suleyman M, Back T, Hughes C, Ledsam JR, Ronneberger O (2021) Clinically applicable segmentation of head and neck anatomy for radiotherapy: deep learning algorithm development and validation study. J Med Internet Res 23(7):26151. https://doi.org/10.2196/26151
Vrtovec T, Močnik D, Strojan P, Pernuš F, Ibragimov B (2020) Auto-segmentation of organs at risk for head and neck radiotherapy planning: From atlas-based to deep learning methods. Med Phys 14(9):929–950. https://doi.org/10.1002/mp.14320
Article Google Scholar
Tappeiner E, Pröll S, Hönig M, Raudaschl PF, Zaffino P, Spadea MF, Sharp GC, Schubert R, Fritscher K (2019) Multi-organ segmentation of the head and neck area: an efficient hierarchical neural networks approach. Int J Comput Assist Radiol Surg 14(5):745–754. https://doi.org/10.1007/s11548-019-01922-4
Article PubMed Google Scholar
Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH (2021) nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18(2):203–211. https://doi.org/10.1038/s41592-020-01008-z
Article CAS PubMed Google Scholar
Li Z, Kamnitsas K, Glocker B (2021) Analyzing overfitting under class imbalance in neural networks for image segmentation. IEEE Trans Med Imaging 40(3):1065–1077. https://doi.org/10.1109/TMI.2020.3046692
Article CAS PubMed Google Scholar
Guo D, Jin D, Zhu Z, Ho TY, Harrison AP, Chao CH, Xiao J, Yuille A, Lin CY, Lu L (2020) Organ at risk segmentation for head and neck cancer using stratified learning and neural architecture search. Proc Conf Comput Vis Pattern Recogn 4222–4231. https://doi.org/10.1109/CVPR42600.2020.00428
Gao Y, Huang R, Chen M, Wang Z, Deng J, Chen Y, Yang Y, Zhang J, Tao C, Li H (2019) FocusNet: imbalanced large and small organ segmentation with an end-to-end deep neural network for head and neck CT images. Proc Conf Med Image Comput Comput Assist Interven 11766:829–838. https://doi.org/10.1007/978-3-030-32248-9_92
Gao Y, Huang R, Yang Y, Zhang J, Shao K, Tao C, Chen Y, Metaxas DN, Li H, Chen M (2020) FocusNetv2: imbalanced large and small organ segmentation with adversarial shape constraint for head and neck CT images. Med Image Anal 67:1–20. https://doi.org/10.1016/j.media.2020.101831
Article Google Scholar
Oktay O, Ferrante E, Kamnitsas K, Heinrich M, Bai W, Caballero J, Cook SA, De Marvao A, Dawes T, O’Regan DP, Kainz B, Glocker B, Rueckert D (2018) Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation. Trans Med Imaging 37(2):384–395. https://doi.org/10.1109/TMI.2017.2743464
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Adv Neural Inf Process Syst 27:2672–2680
Raudaschl PF, Zaffino P, Sharp GC, Spadea MF, Chen A, Dawant BM, Albrecht T, Gass T, Langguth C, Luthi M, Jung F, Knapp O, Wesarg S, Mannion-Haworth R, Bowes M, Ashman A, Guillard G, Brett A, Vincent G, Orbes-Arteaga M, Cardenas-Pena D, Castellanos-Dominguez G, Aghdasi N, Li Y, Berens A, Moe K, Hannaford B, Schubert R, Fritscher KD (2017) Evaluation of segmentation methods on head and neck CT: auto-segmentation challenge 2015. Med Phys 44(5):2020–2036. https://doi.org/10.1002/mp.12197
Article PubMed Google Scholar
Chen Z, Li C, He J, Ye J, Song D, Wang S, Gu L, Qiao Y (2021) A novel hybrid convolutional neural network for accurate organ segmentation in 3d head and neck CT images. Proc Conf Med Image Comput Comput Assist Interven 569–578. https://doi.org/10.1007/978-3-030-87193-2
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. Proc Conf Med Image Comput Comput Assist Interven 9351:234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Article Google Scholar
Tang H, Liu X, Han K, Xie X, Chen X, Qian H, Liu Y, Sun S, Bai N (2021) Spatial context-aware self-attention model for multi-organ segmentation. Proc Conf Appl Comput Vis 938–948. https://doi.org/10.1109/WACV48630.2021.00098
Roth HR, Oda H, Hayashi Y, Oda M, Shimizu N, Fujiwara M, Misawa K, Mori K (2017) Hierarchical 3D fully convolutional networks for multi-organ segmentation. ArXiv preprint arXiv:1704.06382
Milletari F, Navab N, Ahmadi SA (2016) V-Net: Fully convolutional neural networks for volumetric medical image segmentation. Proc Conf 3D Vis 565–571. https://doi.org/10.1109/3DV.2016.79
Sudre CH, Li W, Vercauteren T, Ourselin S, Jorge Cardoso M (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. Proc Int Workshop Multimodal Learn Clin Decis Support 10553:240–248. https://doi.org/10.1007/978-3-319-67558-9_28
Article Google Scholar
Zhu W, Huang Y, Zeng L, Chen X, Liu Y, Qian Z, Du N, Fan W, Xie X (2019) AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Med Phys 46(2):576–589. https://doi.org/10.1002/mp.13300
Article PubMed Google Scholar
Lin TY, Goyal P, Girshick R, He K, Dollar P (2020) Focal Loss for Dense Object Detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Ang KK, Zhang Q, Rosenthal DI, Nguyen-Tan PF, Sherman EJ, Weber RS, Galvin JM, Bonner JA, Harris J, El-Naggar AK, Gillison ML, Jordan RC, Konski AA, Thorstad WL, Trotti A, Beitler JJ, Garden AS, Spanos WJ, Yom SS, Axelrod RS (2014) Randomized phase III trial of concurrent accelerated radiation plus cisplatin with or without cetuximab for stage III to IV head and neck carcinoma: RTOG 0522. J Clin Oncol 32(27):2940–2950. https://doi.org/10.1200/JCO.2013.53.5633
Article CAS PubMed PubMed Central Google Scholar
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. ArXiv preprint arXiv:1607.08022
Ma J, Chen J, Ng M, Huang R, Li Y, Li C, Yang X, Martel AL (2021) Loss odyssey in medical image segmentation. Med Image Anal 71. https://doi.org/10.1016/j.media.2021.102035
Kodym O, Španěl M, Herout A (2019) Segmentation of head and neck organs at risk using CNN with batch dice loss. German Conf Pattern Recogn 105–114. https://doi.org/10.1007/978-3-030-12939-2_8
Fritscher K, Raudaschl P, Zaffino P, Spadea MF, Sharp GC, Schubert R (2016) Deep neural networks for fast segmentation of 3D medical images. Proc Conf Med Image Comput Comput Assist Interven 158–165. https://doi.org/10.1007/978-3-319-46723-8_19
Tappeiner E, Pröll S, Fritscher K, Welk M, Schubert R (2020) Training of head and neck segmentation networks with shape prior on small datasets. Int J Comput Assist Radiol Surg 15(9):1417–1425. https://doi.org/10.1007/s11548-020-02175-2
Article PubMed Google Scholar

Download references

Funding

Open access funding provided by UMIT TIROL-Private Universität für Gesundheitswissenschaften und -technologie GmbH.

Author information

Authors and Affiliations

Department for Biomedical Computer Science and Mechatronics, UMIT—Private University for Health Sciences, Medical Informatics and Technology, Eduard-Wallnöfer-Zentrum 1, 6060, Hall in Tyrol, Tyrol, Austria
Elias Tappeiner, Martin Welk & Rainer Schubert

Authors

Elias Tappeiner
View author publications
You can also search for this author in PubMed Google Scholar
Martin Welk
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Schubert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elias Tappeiner.

Ethics declarations

Conflict of interest

Elias Tappeiner, Martin Welk, Rainer Schubert declare to have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed Consent

Patients informed consent was given in the data originating clinical study [21].

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tappeiner, E., Welk, M. & Schubert, R. Tackling the class imbalance problem of deep learning-based head and neck organ segmentation. Int J CARS 17, 2103–2111 (2022). https://doi.org/10.1007/s11548-022-02649-5

Download citation

Received: 05 January 2022
Accepted: 20 April 2022
Published: 16 May 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11548-022-02649-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Tackling the class imbalance problem of deep learning-based head and neck organ segmentation

Abstract

Purpose

Methods

Results

Conclusion

Similar content being viewed by others

FocusNet: Imbalanced Large and Small Organ Segmentation with an End-to-End Deep Neural Network for Head and Neck CT Images

A Deep-Learning Lesion Segmentation Model that Addresses Class Imbalance and Expected Low Probability Tissue Abnormalities in Pre and Postoperative Liver MRI

Multi-class Segmentation of Organ at Risk from Abdominal CT Images: A Deep Learning Approach

Introduction

Related work

Method

Dataset

Segmentation network design

Class imbalance measurement

Class adaptive dice loss

Results

Implementation details

Discussion

Conclusion

Availability

Change history

03 September 2022

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed Consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation