A significant proportion of patients scanned in a clinical setting have follow-up scans. We show in this work that such longitudinal scans alone can be used as a form of “free” self-supervision for training a deep network. We demonstrate this self-supervised learning for the case of T2-weighted sagittal lumbar Magnetic Resonance Images (MRIs). A Siamese convolutional neural network (CNN) is trained using two losses: (i) a contrastive loss on whether the scan is of the same person (i.e. longitudinal) or not, together with (ii) a classification loss on predicting the level of vertebral bodies. The performance of this pre-trained network is then assessed on a grading classification task. We experiment on a dataset of 1016 subjects, 423 possessing follow-up scans, with the end goal of learning the disc degeneration radiological gradings attached to the intervertebral discs. We show that the performance of the pre-trained CNN on the supervised classification task is (i) superior to that of a network trained from scratch; and (ii) requires far fewer annotated training samples to reach an equivalent performance to that of the network trained from scratch.



We are grateful for discussions with Prof. Jeremy Fairbank, Dr. Jill Urban, and Dr. Frances Williams. This work was supported by the RCUK CDT in Healthcare Innovation (EP/G036861/1) and the EPSRC Programme Grant Seebibyte (EP/M013774/1). TwinsUK is funded by the Wellcome Trust; European Communitys Seventh Framework Programme (FP7/2007–2013). The study also receives support from the National Institute for Health Research (NIHR) Clinical Research Facility at Guys & St Thomas NHS Foundation Trust and NIHR Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London.


  1. 1.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of BMVC (2014)Google Scholar
  2. 2.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of CVPR (2005)Google Scholar
  3. 3.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of CVPR (2009)Google Scholar
  4. 4.
    Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of ICCV (2015)Google Scholar
  5. 5.
    Jamaludin, A., Kadir, T., Zisserman, A.: SpineNet: automatically pinpointing classification evidence in spinal MRIs. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 166–175. Springer, Cham (2016). doi: 10.1007/978-3-319-46723-8_20 CrossRefGoogle Scholar
  6. 6.
    Lootus, M., Kadir, T., Zisserman, A.: Vertebrae detection and labelling in lumbar MR images. In: MICCAI Workshop: CSI (2013)Google Scholar
  7. 7.
    Mobahi, H., Collobert, R., Weston, J.: Deep learning from temporal coherence in video. In: ICML (2009)Google Scholar
  8. 8.
    Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016)CrossRefGoogle Scholar
  9. 9.
    Vedaldi, A., Lenc, K.: MatConvNet - convolutional neural networks for MATLAB. CoRR abs/1412.4564 (2014)Google Scholar
  10. 10.
    Wang, X., Gupta, A.: Unsupervised learning of visual representations using videos. In: ICCV (2015)Google Scholar
  11. 11.
    Wiskott, L., Sejnowski, T.J.: Slow feature analysis: unsupervised learning of invariances. Neural Comput. 14(4), 715–770 (2002)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.VGG, Department of Engineering ScienceUniversity of OxfordOxfordUK
  2. 2.OptellumOxfordUK

Personalised recommendations