Momentum Contrastive Voxel-Wise Representation Learning for Semi-supervised Volumetric Medical Image Segmentation

You, Chenyu; Zhao, Ruihan; Staib, Lawrence H.; Duncan, James S.

doi:10.1007/978-3-031-16440-8_61

Chenyu You¹²,
Ruihan Zhao¹³,
Lawrence H. Staib^12,14,15 &
…
James S. Duncan^12,14,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13434))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

6838 Accesses
41 Citations

Abstract

Contrastive learning (CL) aims to learn useful representation without relying on expert annotations in the context of medical image segmentation. Existing approaches mainly contrast a single positive vector (i.e., an augmentation of the same image) against a set of negatives within the entire remainder of the batch by simply mapping all input features into the same constant vector. Despite the impressive empirical performance, those methods have the following shortcomings: (1) it remains a formidable challenge to prevent the collapsing problems to trivial solutions; and (2) we argue that not all voxels within the same image are equally positive since there exist the dissimilar anatomical structures with the same image. In this work, we present a novel Contrastive Voxel-wise Representation Learning (CVRL) method to effectively learn low-level and high-level features by capturing 3D spatial context and rich anatomical information along both the feature and the batch dimensions. Specifically, we first introduce a novel CL strategy to ensure feature diversity promotion among the 3D representation dimensions. We train the framework through bi-level contrastive optimization (i.e., low-level and high-level) on 3D images. Experiments on two benchmark datasets and different labeled settings demonstrate the superiority of our proposed framework. More importantly, we also prove that our method inherits the benefit of hardness-aware property from the standard CL approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://atriaseg2018.cardiacatlas.org/.

References

Bai, W., Chen, C., Tarroni, G., Duan, J., Guitton, F., Petersen, S.E., Guo, Y., Matthews, P.M., Rueckert, D.: Self-supervised learning for cardiac MR image segmentation by anatomical position prediction. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 541–549. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_60
Chapter Google Scholar
Bortsova, G., Dubost, F., Hogeweg, L., Katramados, I., de Bruijne, M.: Semi-supervised medical image segmentation via learning consistency under transformations. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 810–818. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_90
Chapter Google Scholar
Chaitanya, K., Erdil, E., Karani, N., Konukoglu, E.: Contrastive learning of global and local features for medical image segmentation with limited annotations. In: NeurIPS (2020)
Google Scholar
Chen, S., Bortsova, G., García-Uceda Juárez, A., van Tulder, G., de Bruijne, M.: Multi-task attention-based semi-supervised learning for medical image segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11766, pp. 457–465. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32248-9_51
Chapter Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR (2020)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR, vol. 2, pp. 1735–1742. IEEE (2006)
Google Scholar
Hang, W., et al.: Local and global structure-aware entropy regularized mean teacher model for 3D left atrium segmentation. In: Martel, A.L., et al. (eds.) Local and global structure-aware entropy regularized mean teacher model for 3d left atrium segmentation. LNCS, vol. 12261, pp. 562–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_55
Chapter Google Scholar
Hu, X., Zeng, D., Xu, X., Shi, Y.: Semi-supervised contrastive learning for label-efficient medical image segmentation. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) Semi-supervised contrastive learning for label-efficient medical image segmentation. LNCS, vol. 12902, pp. 481–490. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87196-3_45
Chapter Google Scholar
Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., Zhao, H.: On feature decorrelation in self-supervised learning. In: ICCV, pp. 9598–9608 (2021)
Google Scholar
Jing, L., Vincent, P., LeCun, Y., Tian, Y.: Understanding dimensional collapse in contrastive self-supervised learning. arXiv preprint arXiv:2110.09348 (2021)
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)
Li, S., Zhang, C., He, X.: Shape-aware semi-supervised 3D semantic segmentation for medical images. In: Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 552–561. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_54
Chapter Google Scholar
Li, X., Yu, L., Chen, H., Fu, C.W., Heng, P.A.: Semi-supervised skin lesion segmentation via transformation consistent self-ensembling model. arXiv preprint arXiv:1808.03887 (2018)
Liu, F., et al.: Graph-in-graph network for automatic gene ontology description generation. arXiv preprint arXiv:2206.05311 (2022)
Liu, F., You, C., Wu, X., Ge, S., Sun, X., et al.: Auto-encoding knowledge graph for unsupervised medical report generation. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Google Scholar
Luo, X., Chen, J., Song, T., Wang, G.: Semi-supervised medical image segmentation through dual-task consistency. In: AAAI (2020)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3DV, pp. 565–571. IEEE (2016)
Google Scholar
Misra, I., Maaten, L.v.d.: Self-supervised learning of pretext-invariant representations. In: CVPR, pp. 6707–6717 (2020)
Google Scholar
Nie, D., Gao, Y., Wang, L., Shen, D.: Asdnet: Attention based semi-supervised deep networks for medical image segmentation. In: MICCAI. pp. 370–378. Springer (2018)
Google Scholar
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Roth, H.R., Farag, A., Turkbey, E., Lu, L., Liu, J., Summers, R.M.: Data from pancreas-ct. the cancer imaging archive (2016)
Google Scholar
Sun, S., Han, K., Kong, D., You, C., Xie, X.: Mirnf: medical image registration via neural fields. arXiv preprint arXiv:2206.03111 (2022)
Taleb, A., et al.: 3d self-supervised methods for medical imaging. In: NeurIPS, pp. 18158–18172 (2020)
Google Scholar
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: NeurIPS, pp. 1195–1204 (2017)
Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. arXiv preprint arXiv:1906.05849 (2019)
Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: CVPR, pp. 2495–2504 (2021)
Google Scholar
Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV, pp. 2794–2802 (2015)
Google Scholar
Wu, Z., Xiong, Y., Stella, X.Y., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR (2018)
Google Scholar
You, C., Chen, N., Zou, Y.: Self-supervised contrastive cross-modality representation learning for spoken question answering. arXiv preprint arXiv:2109.03381 (2021)
You, C., Dai, W., Staib, L., Duncan, J.S.: Bootstrapping semi-supervised medical image segmentation with anatomical-aware contrastive distillation. arXiv preprint arXiv:2206.02307 (2022)
You, C., et al.: Incremental learning meets transfer learning: application to multi-site prostate MRI segmentation. arXiv preprint arXiv:2206.01369 (2022)
You, C., Yang, J., Chapiro, J., Duncan, J.S.: Unsupervised wasserstein distance guided domain adaptation for 3D multi-domain liver segmentation. In: Cardoso, J., et al. (eds.) IMIMIC/MIL3ID/LABELS -2020. LNCS, vol. 12446, pp. 155–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61166-8_17
Chapter Google Scholar
You, C., Zhao, R., Liu, F., Chinchali, S., Topcu, U., Staib, L., Duncan, J.S.: Class-aware generative adversarial transformers for medical image segmentation. arXiv preprint arXiv:2201.10737 (2022)
You, C., Zhou, Y., Zhao, R., Staib, L., Duncan, J.S.: SimCVD: simple contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Trans. Med. Imaging, 2022 (2022)
Google Scholar
Yu, L., Wang, S., Li, X., Fu, C.-W., Heng, P.-A.: Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 605–613. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_67
Chapter Google Scholar
Zhang, Y., Yang, L., Chen, J., Fredericksen, M., Hughes, D.P., Chen, D.Z.: Deep adversarial networks for biomedical image segmentation utilizing unannotated images. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 408–416. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_47
Chapter Google Scholar
Zheng, H., Lin, L., Hu, H., Zhang, Q., Chen, Q., Iwamoto, Y., Han, X., Chen, Y.-W., Tong, R., Wu, J.: Semi-supervised segmentation of liver using adversarial learning with deep atlas prior. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11769, pp. 148–156. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32226-7_17
Chapter Google Scholar
Zhou, Y., et al.: Prior-aware neural network for partially-supervised multi-organ segmentation. In: ICCV, pp. 10672–10681 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering, Yale University, New Haven, CT, USA
Chenyu You, Lawrence H. Staib & James S. Duncan
Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA
Ruihan Zhao
Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA
Lawrence H. Staib & James S. Duncan
Biomedical Engineering, Yale University, New Haven, CT, USA
Lawrence H. Staib & James S. Duncan

Authors

Chenyu You
View author publications
You can also search for this author in PubMed Google Scholar
Ruihan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence H. Staib
View author publications
You can also search for this author in PubMed Google Scholar
James S. Duncan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenyu You .

Editor information

Editors and Affiliations

Rochester Institute of Technology, Rochester, NY, USA
Linwei Wang
Chinese University of Hong Kong, Hong Kong, Hong Kong
Qi Dou
University of Virginia, Charlottesville, VA, USA
P. Thomas Fletcher
National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
Stefanie Speidel
Case Western Reserve University, Cleveland, OH, USA
Shuo Li

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 201 KB)

Appendices

Appendix

Table 3. Quantitative segmentation results on the pancreas dataset. The backbone network of all evaluated methods are V-Net.

Full size table

Table 4. More Ablation study on the LA dataset for voxel-wise contrastive objective $\mathcal {L}_v$ and dimensional contrastive objective $\mathcal {L}_d$ (8 labeled and 72 unlabeled). We can observe our dimensional contrastive objective can be considered as the complementary to voxel-wise contrastive objective. This demonstrates the importance of dimensional contrastive objective, which provides good performance gains.

Full size table

A Hardness-aware Property of the Contrastive Losses

Recent work [26] has studies the property of standard contrastive objective (i.e., InfoNCE). For better illustration, we reproduce the contrastive objective defined in Sect. 1.2 as follows:

$$\begin{aligned} \mathcal {L}_q = -\log \frac{\exp (q \cdot k_+/\tau )}{\exp (q \cdot k_+/\tau ) + \sum _{k \in \mathcal {K}_-}\exp (q \cdot k/\tau )} \end{aligned}$$

Given query vector q, and a set of key vectors $\mathcal {K}$, the InfoNCE loss can be understood as maximizing/minimizing the positive score $\exp (q \cdot k_+/\tau )$/negative score $\exp (q \cdot k/\tau )$. Concretely, $\mathcal {L}_q = -\log P_+$ where $P_i$ is the probability of q matched to $k_i$:

$$\begin{aligned} P_i = \frac{\exp (q \cdot k_i/\tau )}{\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )} \end{aligned}$$

Next, we show that InfoNCE has the hardness-aware property compared to a more naïve contrastive loss $\mathcal {L}'_q$ whose object is also to maximize the similarity between the query q and the positive key $k_+$:

$$\begin{aligned} \mathcal {L}'_q = -q \cdot k_+ + \lambda \sum _{k \in \mathcal {K}_-}q \cdot k \end{aligned}$$

We analyze the derivative with respect to the positive key $k_+$ and any negative key $k_-$:

$$\begin{aligned} \frac{\partial \mathcal {L}_q}{\partial k_+}&= -\frac{[\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )] \exp (q \cdot k_+/\tau ) - [\exp (q \cdot k_+/\tau )]^2}{P_+ \, [\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )]^2} \cdot \frac{q}{\tau },\\&= -\frac{\sum _{k \in \mathcal {K}_-}\exp (q \cdot k/\tau )}{\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )} \cdot \frac{q}{\tau } = -\frac{q}{\tau }\sum _{k\ne +}P_k,\\ \frac{\partial \mathcal {L}_q}{\partial k_-}&= -\frac{-\exp (q \cdot k_+/\tau )\exp (q \cdot k_-/\tau )}{P_+ \, [\sum _{k \in \mathcal {K}}\exp (q \cdot k/\tau )]^2} \cdot \frac{q}{\tau } =\frac{q}{\tau }P_{-},\\ \frac{\partial \mathcal {L'}_q}{\partial k_+}&= -q, \quad \frac{\partial \mathcal {L'}_q}{\partial k_-} = \lambda q. \end{aligned}$$

We can observe that the derivative of $\mathcal {L}_q$ with respect to any negative sample $k_-$ is proportional to the exponential term $\exp (q \cdot k_{-}/\tau )$, indicating hardness-aware property. On the other hand, $\mathcal {L}'_q$ is not hardness-aware because each negative key is weighted the same (Table 4).

In Sect. 1.2, both voxel-wise and dimensional contrastive objectives use the InfoNCE loss, and thus benefit from the hardness-aware property. (1) Voxel-wise Contrastive Objective, we define queries and keys as:

$$\begin{aligned} q = f(t(x))_i,\quad k_i = g(t'(x))_i,\quad k_j = g(t'(x))_j, \end{aligned}$$

where i and j denote voxel indices. $k_j$ is a feature voxel at a different location, which is considered as a negative key. $\mathcal {L}_v$ essentially pushes the $k_j$ away from q more strongly when they are close. This encourages the representations to contain unique local information. (2) Dimensional Contrastive Objective: for brevity, here we denote i and j as dimension index. $\mathcal {L}_d$ encourages each dimension of the features to encode dissimilar information and prevents dimensional collapse [25].

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

You, C., Zhao, R., Staib, L.H., Duncan, J.S. (2022). Momentum Contrastive Voxel-Wise Representation Learning for Semi-supervised Volumetric Medical Image Segmentation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13434. Springer, Cham. https://doi.org/10.1007/978-3-031-16440-8_61

Download citation

DOI: https://doi.org/10.1007/978-3-031-16440-8_61
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16439-2
Online ISBN: 978-3-031-16440-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Momentum Contrastive Voxel-Wise Representation Learning for Semi-supervised Volumetric Medical Image Segmentation