Classifiers for medical image analysis are often trained with a single consensus label, based on combining labels given by experts or crowds. However, disagreement between annotators may be informative, and thus removing it may not be the best strategy. As a proof of concept, we predict whether a skin lesion from the ISIC 2017 dataset is a melanoma or not, based on crowd annotations of visual characteristics of that lesion. We compare using the mean annotations, illustrating consensus, to standard deviations and other distribution moments, illustrating disagreement. We show that the mean annotations perform best, but that the disagreement measures are still informative. We also make the crowd annotations used in this paper available at



We thank the students of the 8QA01 2017–2018 course for their participation in gathering the annotations.


  1. 1.
    Hussein, S., Cao, K., Song, Q., Bagci, U.: Risk stratification of lung nodules using 3D CNN-based multi-task learning. arXiv preprint arXiv:1704.08797 (2017)
  2. 2.
    O’Neil, A.Q., Murchison, J.T., van Beek, E.J.R., Goatman, K.A.: Crowdsourcing labels for pathological patterns in CT lung scans: can non-experts contribute expert-quality ground truth? In: Cardoso, M.J., et al. (eds.) LABELS/CVII/STENT -2017. LNCS, vol. 10552, pp. 96–105. Springer, Cham (2017). Scholar
  3. 3.
    Cheplygina, V., Perez-Rovira, A., Kuo, W., Tiddens, H.A.W.M., de Bruijne, M.: Early experiences with crowdsourcing airway annotations in chest CT. In: Carneiro, G., et al. (eds.) LABELS/DLMIA -2016. LNCS, vol. 10008, pp. 209–218. Springer, Cham (2016). Scholar
  4. 4.
    Maier-Hein, L., Kondermann, D., Roß, T., Mersmann, S., Heim, E., Bodenstedt, S., Kenngott, H.G., Sanchez, A., Wagner, M., Preukschas, A.: Crowdtruth validation: a new paradigm for validating algorithms that rely on image correspondences. Int. J. Comput. Assist. Radiol. Surg. 10(8), 1201–1212 (2015)CrossRefGoogle Scholar
  5. 5.
    Guan, M.Y., Gulshan, V., Dai, A.M., Hinton, G.E.: Who said what: Modeling individual labelers improves classification. arXiv preprint arXiv:1703.08774 (2017)
  6. 6.
    Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). arXiv preprint arXiv:1710.05006 (2017)
  7. 7.
    Abbasi, N.R., et al.: Early diagnosis of cutaneous melanoma: revisiting the abcd criteria. Jama 292(22), 2771–2776 (2004)CrossRefGoogle Scholar
  8. 8.
    Murthy, V., Hou, L., Samaras, D., Kurc, T.M., Saltz, J.H.: Center-focusing multi-task CNN with injected features for classification of glioma nuclear images. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 834–841. IEEE (2017)Google Scholar
  9. 9.
    Dhungel, N., Carneiro, G., Bradley, A.P.: A deep learning approach for the analysis of masses in mammograms with minimal user intervention. Med. Image Anal. 37, 114–128 (2017)CrossRefGoogle Scholar
  10. 10.
    Dumitrache, A., Aroyo, L., Welty, C.: Crowdsourcing ground truth for medical relation extraction. ACM Trans. Interact. Intell. Syst. (TiiS) 8(2), 12 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Medical Image Analysis, Department of Biomedical EngineeringEindhoven University of TechnologyEindhovenThe Netherlands
  2. 2.Image Sciences InstituteUniversity Medical Center UtrechtUtrechtThe Netherlands

Personalised recommendations