Uncertainty Driven Multi-loss Fully Convolutional Networks for Histopathology
Abstract
Different works have shown that the combination of multiple loss functions is beneficial when training deep neural networks for a variety of prediction tasks. Generally, such multi-loss approaches are implemented via a weighted multi-loss objective function in which each term encodes a different desired inference criterion. The importance of each term is often set using empirically tuned hyper-parameters. In this work, we analyze the importance of the relative weighting between the different terms of a multi-loss function and propose to leverage the model’s uncertainty with respect to each loss as an automatically learned weighting parameter. We consider the application of colon gland analysis from histopathology images for which various multi-loss functions have been proposed. We show improvements in classification and segmentation accuracy when using the proposed uncertainty driven multi-loss function.
1 Introduction
Although deep learning models have shown remarkable results on a variety of prediction tasks, recent works applied to medical image analysis have demonstrated improved performance by incorporating additional domain-specific information [1]. In fact, medical image analysis datasets are typically not large enough for learning robust features, however, there exist a variety of expert knowledge that can be leveraged to guide the underlying learning model. Such knowledge or cues are generally considered as a set of auxiliary losses that serve to improve or guide the learning of a primary task (e.g. image classification or segmentation). Specifically, these cues are incorporated in the training of deep convolutional networks using a multi-loss objective function combining a variety of objectives learned from a shared image representation. The combination of multiple loss functions can be interpreted as a form of regularization as it constrains search space for possible candidate solutions for the primary task.
Different types of cues can be combined in a multi-loss objective function to improve the generalization of deep networks. Multi-loss functions have been proposed for a variety of medical applications: colon histology images, skin dermoscopy images or chest X-Ray images. Chen et al. [2] proposed a multi-loss learning framework for gland segmentation from histology images in which features from different layers of a deep fully convolutional network were combined through auxiliary loss functions and added to a per-pixel classification loss. BenTaieb et al. [3] proposed a two-loss objective function combining gland classification (malignant vs benign) and segmentation (gland delineation) and showed that both tasks were mutually beneficial. Additionally, authors also proposed a multi-loss objective function for gland segmentation that equips a fully convolutional network with topological and geometrical constraints [4] that encourage learning topologically plausible and smooth segmentations. Kawahara et al. [5] used auxiliary losses to train a multi-scale convolutional network to classify skin lesions. More recently, adversarial loss functions were also proposed as additional forms of supervision. Dai et al. [6] leveraged an adversarial loss to guide the segmentation of organs from chest X-Ray images. While these previous works confirm the utility of training deep networks with a multi-loss objective function, they do not clearly explain how to set the contribution of each loss.
Most existing works use an empirical approach to combine different losses. Generally, all losses are simply summed with equal contribution or manually tuned hyper-parameters are used to control the trade-off among all terms. In this work, we investigate the importance of an appropriate choice of weighting between each loss and propose a way to automate it. Specifically, we utilize concepts from Bayesian deep learning [7, 8] and introduce an uncertainty based multi-loss objective function. In the proposed multi-loss, the importance of each term is learned based on the model’s uncertainty with respect to each loss. Uncertainty was leveraged in many medical image analysis applications (e.g. segmentation [9], registration [10]). However, to the best of our knowledge, uncertainty was only explored for the task of image registration in the context of deep learning models for medical images. Yang et al. [11] proposed a CNN model for image registration and showed how uncertainty helps highlighting misaligned regions. Previous works did not consider automating or using uncertainty for guiding the training of multi-loss objective functions designed for medical image analysis.
We illustrate our approach on the task of colon gland analysis leveraging the multi-loss objective functions proposed in previous works [3, 4]. We extend these previous works by re-defining the proposed loss functions with an uncertainty driven weighting. We linearly combine classification, segmentation, topology and geometry losses weighted by the model’s uncertainty for each of these terms. In the proposed uncertainty driven multi-loss, the uncertainty captures how much variance there is in the model’s predictions. This variance or noise in the predictions varies for each term and thus reflects the uncertainty inherent to the classification, segmentation, topology or geometry loss.
2 Method
Our goal is to learn how to combine multiple terms relevant to gland image analysis into a single objective function. For instance, gland classification and gland segmentation can both benefit from a joint learning framework and information about the geometry and topology of glands can facilitate learning plausible segmentations. Note that we refer to gland’s geometry and topology in terms of smooth boundaries as well as containment and exclusion properties between different parts of objects (the lumen is generally contained within a thick epithelial border and surrounded by stroma cells that exclude both the lumen and the border, see Fig. 3 for an example of gland segmentation).
We train a fully convolutional network parameterized by \(\theta \), from a set of training images x and their corresponding ground truth segmentation masks S along with their tissue class label binary vector C represented by \(\{(x^{(n)}, S^{(n)}, C^{(n)}); n=1,2,\ldots ,N\}\). We drop (n) when referring to a single image x, class label C or segmentation mask S. We note K the total number of image class labels (e.g. \(K=2\) for malignant or benign tissue images of colon adenocarcinomas) and L the total number of region labels in the segmentation mask (e.g. \(L=3\) for lumen, epithelial border and stroma). The network’s architecture is shown in Fig. 1. To predict class labels C, we use the network’s activations \(f_c^\theta (x)\) from the last layer of the encoder as they correspond to a coarser representation of x. To obtain a crisp segmentation of a color image x, we use the activations \(f_s^\theta (x)\) from the last layer of the decoder and we assign a vector \(S_p= ( S_p^1, S_p^2, ... ,S_p^{L} ) \in \{0,1\}^{L}\) to the p-th pixel \(x_p\) in x, where \(S_p^r\) indicates whether pixel \(x_p\) belongs to region r, and L is the number of region labels. We assume region labels r are not always mutually exclusive such that containment properties (e.g. glands’ lumen is contained within the epithelial border) are valid label assignments.
3 Experiments and Discussion
Performance of different loss functions combined with manually tuned loss weights and uncertainty-guided weights. Results are reported on the Warwick-QU original test set.
Loss | Weights | Classification accuracy | Pixel accuracy | Object dice | Hausdorff distance | |||
---|---|---|---|---|---|---|---|---|
\(\mathcal{L}_c\) | \(\mathcal{L}_s\) | \(\mathcal{L}_t\) | \(\mathcal{L}_g\) | |||||
\(\mathcal{L}_c\) | 1 | 0 | 0 | 0 | 0.87 | – | – | – |
\(\mathcal{L}_s\) | 0 | 1 | 0 | 0 | – | 0.79 | 0.81 | 8.2 |
\(\mathcal{L}_t\) | 0 | 0 | 1 | 0 | – | 0.75 | 0.77 | 8.6 |
\(\mathcal{L}_s+\mathcal{L}_t+\mathcal{L}_g\) | 0 | 1 | 1 | 1 | – | 0.83 | 0.84 | 7.3 |
\(\mathcal{L}_c + \mathcal{L}_s\) | 0.5 | 0.5 | 0 | 0 | 0.90 | 0.79 | 0.80 | 8.4 |
\(\mathcal{L}_c + \mathcal{L}_s + \mathcal{L}_t \) | 0.33 | 0.33 | 0.33 | 0 | 0.94 | 0.78 | 0.80 | 8.4 |
\(\mathcal{L}_c + \mathcal{L}_s + \mathcal{L}_t + \mathcal{L}_g\) | 0.25 | 0.25 | 0.25 | 0.25 | 0.91 | 0.81 | 0.83 | 7.6 |
\(\mathcal{L}_c + \mathcal{L}_s + \mathcal{L}_t + \mathcal{L}_g \ \) | 0.1 | 0.6 | 0.22 | 0.08 | 0.95 | 0.86 | 0.85 | 7.1 |
\(\mathcal{L}_c + \mathcal{L}_s\) | Trained with uncertainty | 0.95 | 0.78 | 0.80 | 8.4 | |||
\(\mathcal{L}_c + \mathcal{L}_s + \mathcal{L}_t\) | 0.94 | 0.79 | 0.81 | 8.2 | ||||
\(\mathcal{L}_c + \mathcal{L}_s + \mathcal{L}_t + \mathcal{L}_g\) | 0.95 | 0.85 | 0.87 | 7.0 |
Multi-loss vs single-loss: We first tested if the combination of different loss functions without uncertainty guidance influences the classification and segmentation accuracy. We used \(\mathcal{L}_{\text {total}} = \lambda \mathcal{L}_c + (1-\lambda ) \mathcal{L}_s\) and explored different values for \(\lambda \in [0,1]\). Figure 2 shows the classification as well as the per-pixel accuracy on the Warwick-QU original test set of 80 images for different values of \(\lambda \). Overall, we observed that learning with multiple losses improved both segmentation and classification performance. In fact, we observed up to 3% (i.e. \(\lambda =\{0.5, 0.6, 0.7\}\)) increase in classification accuracy when using a combination of \(\mathcal{L}_c\) and \(\mathcal{L}_s\) compared to using \(\mathcal{L}_c\) only (i.e. \(\lambda = 1\)). Similarly, for segmentation, we observed the performance improved up to 6% (i.e. \(\lambda =0.3\)) in pixel accuracy when combining both losses compared to using \(\mathcal{L}_s\) only (i.e. \(\lambda =0\)). A similar result is shown in Table 1 when comparing \(\mathcal{L}_c\) vs \(\mathcal{L}_c + \mathcal{L}_s\) with equal weights.
Penalty terms trade-off: We also tested the trade-off between the topology and geometry soft constraints when combined with the segmentation loss. We used different weighting coefficients \(\lambda \) and trained the network with \(\mathcal{L}_{\text {total}} = \mathcal{L}_s + \lambda \mathcal{L}_t + (1-\lambda ) \mathcal{L}_g\). We only varied the importance of the soft constraints. It is interesting to note that there is a wide range of weighting coefficients for which the network produces similar (or almost identical) results. In fact, we observed a minimal change (\({\le }1e\)–2) when varying the importance of each term by ±20% around \(\lambda = 0.5\), which reflects the flexibility of deep networks to adapt to different regularization terms. We also observed that generally sigmoid cross entropy loss \(\mathcal{L}_s\) was more stable than \(\mathcal{L}_t\) or \(\mathcal{L}_g\)-only and outperformed these other losses when each of them was used alone (see Table 1, \(\mathcal{L}_s\) only vs \(\mathcal{L}_t\) only). However, for certain weighting configurations for each penalty term, we observed improved performance (up to 5%, see Fig. 2) in terms of pixel accuracy and object Dice (e.g. \(\lambda = 0.1\) vs. \(\lambda = 0.5\)).
4 Conclusion
We showed that the combination of different loss terms with appropriate weighting can improve model generalization in the context of deep neural networks. We proposed to use uncertainty as a way to combine multiple loss functions that were shown useful for the analysis of glands in colon adenocarcinoma and we observed that this strategy helps improve classification and segmentation performance and can thus bypass the need for extensive grid-search over different weighting configurations. An interesting extension to our work could be to introduce per-instance uncertainty (as opposed to per-loss) which may be useful in situations where the data or labels are noisy.
References
- 1.Litjens , G., et al.: A survey on deep learning in medical image analysis. arXiv preprint arXiv:1702.05747 (2017)
- 2.Chen, H., Qi, X., Yu, L., Heng, P.-A.: DCAN: deep contour-aware networks for accurate gland segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2487–2496 (2016)Google Scholar
- 3.BenTaieb, A., Kawahara, J., Hamarneh, G.: Multi-loss convolutional networks for gland analysis in microscopy. In: IEEE 13th International Symposium on Biomedical Imaging, pp. 642–645 (2016)Google Scholar
- 4.BenTaieb, A., Hamarneh, G.: Topology aware fully convolutional networks for histology gland segmentation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 460–468. Springer, Cham (2016). doi:10.1007/978-3-319-46723-8_53 CrossRefGoogle Scholar
- 5.Kawahara, J., Hamarneh, G.: Multi-resolution-tract CNN with hybrid pretrained and skin-lesion trained layers. In: Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 164–171. Springer, Cham (2016). doi:10.1007/978-3-319-47157-0_20 CrossRefGoogle Scholar
- 6.Dai, W., et al.: Scan: structure correcting adversarial network for chest x-rays organ segmentation. arXiv preprint arXiv:1703.08770 (2017)
- 7.Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. arXiv preprint arXiv:1705.07115 (2017)
- 8.Gal, Y.: Uncertainty in deep learning, Ph.D. dissertation (2016)Google Scholar
- 9.Saad, A., Möller, T., Hamarneh, G.: Probexplorer: uncertainty-guided exploration and editing of probabilistic medical image segmentation. Comput. Graph. Forum 29(3), 1113–1122 (2010). Wiley Online LibraryCrossRefGoogle Scholar
- 10.Marsland, S., Shardlow, T.: Langevin equations for landmark image registration with uncertainty. SIAM J. Imaging Sci. 10(2), 782–807 (2017)MathSciNetCrossRefGoogle Scholar
- 11.Yang, X., Kwitt, R., Niethammer, M.: Fast predictive image registration. In: Carneiro, G., et al. (eds.) LABELS/DLMIA -2016. LNCS, vol. 10008, pp. 48–57. Springer, Cham (2016). doi:10.1007/978-3-319-46976-8_6 Google Scholar
- 12.Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2017)Google Scholar
- 13.Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
- 14.Sirinukunwattana, K., et al.: Gland segmentation in colon histology images: the GlaS challenge contest. Med. Image Anal. 35, 489–502 (2017)CrossRefGoogle Scholar