Crowdsourcing Labels for Pathological Patterns in CT Lung Scans: Can Non-experts Contribute Expert-Quality Ground Truth?
This paper investigates what quality of ground truth might be obtained when crowdsourcing specialist medical imaging ground truth from non-experts. Following basic tuition, 34 volunteer participants independently delineated regions belonging to 7 pathological patterns in 20 scans according to expert-provided pattern labels. Participants’ annotations were compared to a set of reference annotations using Dice similarity coefficient (DSC), and found to range between 0.41 and 0.77. The reference repeatability was 0.81. Analysis of prior imaging experience, annotation behaviour, scan ordering and time spent showed that only the last was correlated with annotation quality. Multiple observers combined by voxelwise majority vote outperformed a single observer, matching the reference repeatability for 5 of 7 patterns. In conclusion, crowdsourcing from non-experts yields acceptable quality ground truth, given sufficient expert task supervision and a sufficient number of observers per scan.
Many thanks to Phil Tolland who developed the software for the ground truth collection tool, and to all of the employees at Toshiba Medical Visualization Systems who took part in this study: Allan Barklie, Erin Beveridge, Antony Brown, Gerald Chau, Alasdair Corbett, Ross Davies, Matt Daykin, Ben Docherty, Venkatesh Gaddam, Keith Goatman, Marta Guarisco, Joseph Henry, Corné Hoogendoorn, Pia Kullik, Aneta Lisowska, Steve Magness, Craig Matear, James Matthews, Chris McGough, Haritha Miryala, Brian Mohr, Costas Plakas, Ian Poole, Marco Razeto, Faye Riley, Matt Shepherd, Simeon Skopalik, Andy Smout, Ken Sutherland, Paul Thomson, Phil Tolland, John Tough, Aidan Wellington and Gavin Wheeler.
- 3.Cheplygina, V., Perez-Rovira, A., Kuo, W., Tiddens, H.A.W.M., de Bruijne, M.: Early experiences with crowdsourcing airway annotations in chest CT. In: Carneiro, G., et al. (eds.) LABELS/DLMIA -2016. LNCS, vol. 10008, pp. 209–218. Springer, Cham (2016). doi: 10.1007/978-3-319-46976-8_22 Google Scholar
- 6.Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: Neural Information Processing Systems (2014)Google Scholar
- 7.Hossain, M., Kauranen, I.: Crowdsourcing: a comprehensive literature review. Strateg. Outsourcing Int. J. 8(1), 1753–8297 (2015)Google Scholar
- 8.Humphries, S.M., Yagihashi, K., Huckleberry, J., Rho, B.H., Schroeder, J.D., Strand, M., Schwarz, M.I., Flaherty, K.R., Kazerooni, E.A., van Beek, E.J.R., Lynch, D.A.: Idiopathic pulmonary fibrosis: data-driven textural analysis of extent of fibrosis at baseline and 15-month follow-up. Radiology 5, 161177 (2017)CrossRefGoogle Scholar
- 12.Piciucchi, S., Tomassetti, S., Ravaglia, C., Gurioli, C., Gurioli, C., Dubini, A., Carloni, A., Chilosi, M., Colby, T.V., Poletti, V.: From traction bronchiectasis to honeycombing in idiopathic pulmonary fibrosis: a spectrum of bronchiolar remodeling also in radiology? BMC Pulm. Med. 16(1), 87 (2016)CrossRefGoogle Scholar
- 13.Salisbury, M.L., Lynch, D.A., van Beek, E.J.R., Kazerooni, E.A., Guo, J., Xia, M., Murray, S., Anstrom, K.A., Yow, E., Martinez, F.J., Hoffman, E.A., Flaherty, K.R.: Idiopathic pulmonary fibrosis: the association between the adaptive multiple features method and fibrosis outcomes. Am. J. Respir. Crit. Care Med. 195(7), 921–929 (2017)CrossRefGoogle Scholar
- 14.Schlesinger, D., Jug, F., Myers, G., Rother, C., Kainmuller, D.: Crowdsourcing image segmentation with aSTAPLE. arXiv (2017)Google Scholar