Skip to main content

Parts2Whole: Self-supervised Contrastive Learning via Reconstruction

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12444)


Contrastive representation learning is the state of the art in computer vision, but requires huge mini-batch sizes, special network design, or memory banks, making it unappealing for 3D medical imaging, while in 3D medical imaging, reconstruction-based self-supervised learning reaches a new height in performance, but lacks mechanisms to learn contrastive representation; therefore, this paper proposes a new framework for self-supervised contrastive learning via reconstruction, called Parts2Whole, because it exploits the universal and intrinsic part-whole relationship to learn contrastive representation without using contrastive loss: Reconstructing an image (whole) from its own parts compels the model to learn similar latent features for all its own partsin the latent space, while reconstructing different images (wholes) from their respective parts forces the model to simultaneously push those parts belonging to different wholes farther apart from each other in the latent space; thereby the trained model is capable of distinguishing images. We have evaluated our Parts2Whole on five distinct imaging tasks covering both classification and segmentation, and compared it with four competing publicly available 3D pretrained models, showing that Parts2Whole significantly outperforms in two out of five tasks while achieves competitive performance on the rest three. This superior performance is attributable to the contrastive representations learned with Parts2Whole. Codes and pretrained models are available at


  • 3D Self-supervised Learning
  • Contrastive representation learning
  • Transfer learning

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-60548-3_9
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-60548-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.


  1. 1.

    If we consider each whole image itself as a “label”, the training process of Parts2Whole is equivalent to predicting the correct “label” given a part of one image as input, or discriminating each image from its parts.

  2. 2.

    3D U-Net:

  3. 3.

    Denote the \(l_2\)-normalized features of a positive pair and negative pair as \(\{\mathcal {F}_E(p_i), \mathcal {F}_E(p'_i)\}\) and \(\{\mathcal {F}_E(p_i), \mathcal {F}_E(p_j)\}\), respectively. The contrastive loss is calculated as \(-\log \frac{\exp (\mathcal {F}_E(p_i) \cdot \mathcal {F}_E(p'_i) / \tau )}{\exp (\mathcal {F}_E(p_i) \cdot \mathcal {F}_E(p'_i) / \tau + \sum _{j=1}^{5000} \exp (\mathcal {F}_E(p_i) \cdot \mathcal {F}_E(p_j) / \tau )}\) where \(\tau = 0.7\).


  1. Ardila, D., et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25(6), 954–961 (2019)

    CrossRef  Google Scholar 

  2. Armato III, S.G., McLennan, G., Bidaut, L., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38(2), 915–931 (2011)

    CrossRef  Google Scholar 

  3. Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. In: Advances in Neural Information Processing Systems, pp. 15509–15519 (2019)

    Google Scholar 

  4. Bakas, S., Reyes, M., Jakab, A., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629 (2018)

  5. Bilic, P., Christ, P.F., Vorontsov, E., Chlebus, G., Chen, H., Dou, Q., et al.: The liver tumor segmentation benchmark (LiTS). arXiv preprint arXiv:1901.04056 (2019)

  6. Bishop, C.M., et al.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  7. Caron, M., Misra, I., Mairal, J., et al.: Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882 (2020)

  8. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  9. Chen, S., Ma, K., Zheng, Y.: Med3D: Transfer learning for 3D medical image analysis. arXiv preprint arXiv:1904.00625 (2019)

  10. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709 (2020)

  11. Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 766–774 (2014)

    Google Scholar 

  12. Gibson, E., Li, W., Sudre, C., et al.: NiftyNet: a deep-learning platform for medical imaging. Comput. Methods Programs Biomed. 158, 113–122 (2018)

    CrossRef  Google Scholar 

  13. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  14. Misra, I., Maaten, L.V.D.: Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6707–6717 (2020)

    Google Scholar 

  15. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).

    CrossRef  Google Scholar 

  16. Setio, A.A.A., Traverso, A., De Bel, T., et al.: Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Med. Image Anal. 42, 1–13 (2017)

    CrossRef  Google Scholar 

  17. Tajbakhsh, N., Gotway, M.B., Liang, J.: Computer-aided pulmonary embolism detection using a novel vessel-aligned multi-planar image representation and convolutional neural networks. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9350, pp. 62–69. Springer, Cham (2015).

    CrossRef  Google Scholar 

  18. Tao, X., Li, Y., Zhou, W., Ma, K., Zheng, Y.: Revisiting Rubik’s cube: self-supervised learning with volume-wise transformation for 3D medical image segmentation. arXiv preprint arXiv:2007.08826 (2020)

  19. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742 (2018)

    Google Scholar 

  20. Zhou, Z., et al.: Models genesis: generic autodidactic models for 3D medical image analysis. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11767, pp. 384–393. Springer, Cham (2019).

    CrossRef  Google Scholar 

Download references


This research has been supported partially by ASU and Mayo Clinic through a Seed Grant and an Innovation Grant, and partially by the NIH under Award Number R01HL128785. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. This work has utilized the GPUs provided partially by the ASU Research Computing and partially by the Extreme Science and Engineering Discovery Environment (XSEDE) funded by the National Science Foundation (NSF) under grant number ACI-1548562. We would like to thank Jiaxuan Pang, Md Mahfuzur Rahman Siddiquee, and Zuwei Guo for evaluating I3D, NiftyNet, and MedicalNet, respectively. The content of this paper is covered by patents pending.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Ruibin Feng or Jianming Liang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Feng, R., Zhou, Z., Gotway, M.B., Liang, J. (2020). Parts2Whole: Self-supervised Contrastive Learning via Reconstruction. In: , et al. Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. DART DCL 2020 2020. Lecture Notes in Computer Science(), vol 12444. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60547-6

  • Online ISBN: 978-3-030-60548-3

  • eBook Packages: Computer ScienceComputer Science (R0)