Skip to main content

Momentum Contrastive Voxel-Wise Representation Learning for Semi-supervised Volumetric Medical Image Segmentation

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (MICCAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13434))

Abstract

Contrastive learning (CL) aims to learn useful representation without relying on expert annotations in the context of medical image segmentation. Existing approaches mainly contrast a single positive vector (i.e., an augmentation of the same image) against a set of negatives within the entire remainder of the batch by simply mapping all input features into the same constant vector. Despite the impressive empirical performance, those methods have the following shortcomings: (1) it remains a formidable challenge to prevent the collapsing problems to trivial solutions; and (2) we argue that not all voxels within the same image are equally positive since there exist the dissimilar anatomical structures with the same image. In this work, we present a novel Contrastive Voxel-wise Representation Learning (CVRL) method to effectively learn low-level and high-level features by capturing 3D spatial context and rich anatomical information along both the feature and the batch dimensions. Specifically, we first introduce a novel CL strategy to ensure feature diversity promotion among the 3D representation dimensions. We train the framework through bi-level contrastive optimization (i.e., low-level and high-level) on 3D images. Experiments on two benchmark datasets and different labeled settings demonstrate the superiority of our proposed framework. More importantly, we also prove that our method inherits the benefit of hardness-aware property from the standard CL approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://atriaseg2018.cardiacatlas.org/.

References

  1. Bai, W., Chen, C., Tarroni, G., Duan, J., Guitton, F., Petersen, S.E., Guo, Y., Matthews, P.M., Rueckert, D.: Self-supervised learning for cardiac MR image segmentation by anatomical position prediction. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 541–549. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_60

    Chapter  Google Scholar 

  2. Bortsova, G., Dubost, F., Hogeweg, L., Katramados, I., de Bruijne, M.: Semi-supervised medical image segmentation via learning consistency under transformations. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 810–818. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_90

    Chapter  Google Scholar 

  3. Chaitanya, K., Erdil, E., Karani, N., Konukoglu, E.: Contrastive learning of global and local features for medical image segmentation with limited annotations. In: NeurIPS (2020)

    Google Scholar 

  4. Chen, S., Bortsova, G., García-Uceda Juárez, A., van Tulder, G., de Bruijne, M.: Multi-task attention-based semi-supervised learning for medical image segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11766, pp. 457–465. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32248-9_51

    Chapter  Google Scholar 

  5. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  6. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR, vol. 2, pp. 1735–1742. IEEE (2006)

    Google Scholar 

  7. Hang, W., et al.: Local and global structure-aware entropy regularized mean teacher model for 3D left atrium segmentation. In: Martel, A.L., et al. (eds.) Local and global structure-aware entropy regularized mean teacher model for 3d left atrium segmentation. LNCS, vol. 12261, pp. 562–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_55

    Chapter  Google Scholar 

  8. Hu, X., Zeng, D., Xu, X., Shi, Y.: Semi-supervised contrastive learning for label-efficient medical image segmentation. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) Semi-supervised contrastive learning for label-efficient medical image segmentation. LNCS, vol. 12902, pp. 481–490. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_45

    Chapter  Google Scholar 

  9. Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., Zhao, H.: On feature decorrelation in self-supervised learning. In: ICCV, pp. 9598–9608 (2021)

    Google Scholar 

  10. Jing, L., Vincent, P., LeCun, Y., Tian, Y.: Understanding dimensional collapse in contrastive self-supervised learning. arXiv preprint arXiv:2110.09348 (2021)

  11. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)

  12. Li, S., Zhang, C., He, X.: Shape-aware semi-supervised 3D semantic segmentation for medical images. In: Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 552–561. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_54

    Chapter  Google Scholar 

  13. Li, X., Yu, L., Chen, H., Fu, C.W., Heng, P.A.: Semi-supervised skin lesion segmentation via transformation consistent self-ensembling model. arXiv preprint arXiv:1808.03887 (2018)

  14. Liu, F., et al.: Graph-in-graph network for automatic gene ontology description generation. arXiv preprint arXiv:2206.05311 (2022)

  15. Liu, F., You, C., Wu, X., Ge, S., Sun, X., et al.: Auto-encoding knowledge graph for unsupervised medical report generation. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)

    Google Scholar 

  16. Luo, X., Chen, J., Song, T., Wang, G.: Semi-supervised medical image segmentation through dual-task consistency. In: AAAI (2020)

    Google Scholar 

  17. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV, pp. 565–571. IEEE (2016)

    Google Scholar 

  18. Misra, I., Maaten, L.v.d.: Self-supervised learning of pretext-invariant representations. In: CVPR, pp. 6707–6717 (2020)

    Google Scholar 

  19. Nie, D., Gao, Y., Wang, L., Shen, D.: Asdnet: Attention based semi-supervised deep networks for medical image segmentation. In: MICCAI. pp. 370–378. Springer (2018)

    Google Scholar 

  20. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  21. Roth, H.R., Farag, A., Turkbey, E., Lu, L., Liu, J., Summers, R.M.: Data from pancreas-ct. the cancer imaging archive (2016)

    Google Scholar 

  22. Sun, S., Han, K., Kong, D., You, C., Xie, X.: Mirnf: medical image registration via neural fields. arXiv preprint arXiv:2206.03111 (2022)

  23. Taleb, A., et al.: 3d self-supervised methods for medical imaging. In: NeurIPS, pp. 18158–18172 (2020)

    Google Scholar 

  24. Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS, pp. 1195–1204 (2017)

    Google Scholar 

  25. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. arXiv preprint arXiv:1906.05849 (2019)

  26. Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: CVPR, pp. 2495–2504 (2021)

    Google Scholar 

  27. Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV, pp. 2794–2802 (2015)

    Google Scholar 

  28. Wu, Z., Xiong, Y., Stella, X.Y., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)

    Google Scholar 

  29. You, C., Chen, N., Zou, Y.: Self-supervised contrastive cross-modality representation learning for spoken question answering. arXiv preprint arXiv:2109.03381 (2021)

  30. You, C., Dai, W., Staib, L., Duncan, J.S.: Bootstrapping semi-supervised medical image segmentation with anatomical-aware contrastive distillation. arXiv preprint arXiv:2206.02307 (2022)

  31. You, C., et al.: Incremental learning meets transfer learning: application to multi-site prostate MRI segmentation. arXiv preprint arXiv:2206.01369 (2022)

  32. You, C., Yang, J., Chapiro, J., Duncan, J.S.: Unsupervised wasserstein distance guided domain adaptation for 3D multi-domain liver segmentation. In: Cardoso, J., et al. (eds.) IMIMIC/MIL3ID/LABELS -2020. LNCS, vol. 12446, pp. 155–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61166-8_17

    Chapter  Google Scholar 

  33. You, C., Zhao, R., Liu, F., Chinchali, S., Topcu, U., Staib, L., Duncan, J.S.: Class-aware generative adversarial transformers for medical image segmentation. arXiv preprint arXiv:2201.10737 (2022)

  34. You, C., Zhou, Y., Zhao, R., Staib, L., Duncan, J.S.: SimCVD: simple contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Trans. Med. Imaging, 2022 (2022)

    Google Scholar 

  35. Yu, L., Wang, S., Li, X., Fu, C.-W., Heng, P.-A.: Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 605–613. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_67

    Chapter  Google Scholar 

  36. Zhang, Y., Yang, L., Chen, J., Fredericksen, M., Hughes, D.P., Chen, D.Z.: Deep adversarial networks for biomedical image segmentation utilizing unannotated images. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 408–416. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_47

    Chapter  Google Scholar 

  37. Zheng, H., Lin, L., Hu, H., Zhang, Q., Chen, Q., Iwamoto, Y., Han, X., Chen, Y.-W., Tong, R., Wu, J.: Semi-supervised segmentation of liver using adversarial learning with deep atlas prior. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 148–156. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_17

    Chapter  Google Scholar 

  38. Zhou, Y., et al.: Prior-aware neural network for partially-supervised multi-organ segmentation. In: ICCV, pp. 10672–10681 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenyu You .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 201 KB)

Appendices

Appendix

Table 3. Quantitative segmentation results on the pancreas dataset. The backbone network of all evaluated methods are V-Net.
Table 4. More Ablation study on the LA dataset for voxel-wise contrastive objective \(\mathcal {L}_v\) and dimensional contrastive objective \(\mathcal {L}_d\) (8 labeled and 72 unlabeled). We can observe our dimensional contrastive objective can be considered as the complementary to voxel-wise contrastive objective. This demonstrates the importance of dimensional contrastive objective, which provides good performance gains.

A Hardness-aware Property of the Contrastive Losses

Recent work [26] has studies the property of standard contrastive objective (i.e., InfoNCE). For better illustration, we reproduce the contrastive objective defined in Sect. 1.2 as follows:

$$\begin{aligned} \mathcal {L}_q = -\log \frac{\exp (q \cdot k_+/\tau )}{\exp (q \cdot k_+/\tau ) + \sum _{k \in \mathcal {K}_-}\exp (q \cdot k/\tau )} \end{aligned}$$

Given query vector q, and a set of key vectors \(\mathcal {K}\), the InfoNCE loss can be understood as maximizing/minimizing the positive score \(\exp (q \cdot k_+/\tau )\)/negative score \(\exp (q \cdot k/\tau )\). Concretely, \(\mathcal {L}_q = -\log P_+\) where \(P_i\) is the probability of q matched to \(k_i\):

$$\begin{aligned} P_i = \frac{\exp (q \cdot k_i/\tau )}{\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )} \end{aligned}$$

Next, we show that InfoNCE has the hardness-aware property compared to a more naïve contrastive loss \(\mathcal {L}'_q\) whose object is also to maximize the similarity between the query q and the positive key \(k_+\):

$$\begin{aligned} \mathcal {L}'_q = -q \cdot k_+ + \lambda \sum _{k \in \mathcal {K}_-}q \cdot k \end{aligned}$$

We analyze the derivative with respect to the positive key \(k_+\) and any negative key \(k_-\):

$$\begin{aligned} \frac{\partial \mathcal {L}_q}{\partial k_+}&= -\frac{[\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )] \exp (q \cdot k_+/\tau ) - [\exp (q \cdot k_+/\tau )]^2}{P_+ \, [\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )]^2} \cdot \frac{q}{\tau },\\&= -\frac{\sum _{k \in \mathcal {K}_-}\exp (q \cdot k/\tau )}{\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )} \cdot \frac{q}{\tau } = -\frac{q}{\tau }\sum _{k\ne +}P_k,\\ \frac{\partial \mathcal {L}_q}{\partial k_-}&= -\frac{-\exp (q \cdot k_+/\tau )\exp (q \cdot k_-/\tau )}{P_+ \, [\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )]^2} \cdot \frac{q}{\tau } =\frac{q}{\tau }P_{-},\\ \frac{\partial \mathcal {L'}_q}{\partial k_+}&= -q, \quad \frac{\partial \mathcal {L'}_q}{\partial k_-} = \lambda q. \end{aligned}$$

We can observe that the derivative of \(\mathcal {L}_q\) with respect to any negative sample \(k_-\) is proportional to the exponential term \(\exp (q \cdot k_{-}/\tau )\), indicating hardness-aware property. On the other hand, \(\mathcal {L}'_q\) is not hardness-aware because each negative key is weighted the same (Table 4).

In Sect. 1.2, both voxel-wise and dimensional contrastive objectives use the InfoNCE loss, and thus benefit from the hardness-aware property. (1) Voxel-wise Contrastive Objective, we define queries and keys as:

$$\begin{aligned} q = f(t(x))_i,\quad k_i = g(t'(x))_i,\quad k_j = g(t'(x))_j, \end{aligned}$$

where i and j denote voxel indices. \(k_j\) is a feature voxel at a different location, which is considered as a negative key. \(\mathcal {L}_v\) essentially pushes the \(k_j\) away from q more strongly when they are close. This encourages the representations to contain unique local information. (2) Dimensional Contrastive Objective: for brevity, here we denote i and j as dimension index. \(\mathcal {L}_d\) encourages each dimension of the features to encode dissimilar information and prevents dimensional collapse [25].

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

You, C., Zhao, R., Staib, L.H., Duncan, J.S. (2022). Momentum Contrastive Voxel-Wise Representation Learning for Semi-supervised Volumetric Medical Image Segmentation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13434. Springer, Cham. https://doi.org/10.1007/978-3-031-16440-8_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16440-8_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16439-2

  • Online ISBN: 978-3-031-16440-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics