DeepScope: Nonintrusive Whole Slide Saliency Annotation and Prediction from Pathologists at the Microscope

  • Andrew J. Schaumberg
  • S. Joseph Sirintrapun
  • Hikmat A. Al-Ahmadie
  • Peter J. Schüffler
  • Thomas J. FuchsEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10477)


Modern digital pathology departments have grown to produce whole-slide image data at petabyte scale, an unprecedented treasure chest for medical machine learning tasks. Unfortunately, most digital slides are not annotated at the image level, hindering large-scale application of supervised learning. Manual labeling is prohibitive, requiring pathologists with decades of training and outstanding clinical service responsibilities. This problem is further aggravated by the United States Food and Drug Administration’s ruling that primary diagnosis must come from a glass slide rather than a digital image. We present the first end-to-end framework to overcome this problem, gathering annotations in a nonintrusive manner during a pathologist’s routine clinical work: (i) microscope-specific 3D-printed commodity camera mounts are used to video record the glass-slide-based clinical diagnosis process; (ii) after routine scanning of the whole slide, the video frames are registered to the digital slide; (iii) motion and observation time are estimated to generate a spatial and temporal saliency map of the whole slide. Demonstrating the utility of these annotations, we train a convolutional neural network that detects diagnosis-relevant salient regions, then report accuracy of 85.15% in bladder and 91.40% in prostate, with 75.00% accuracy when training on prostate but predicting in bladder, despite different pathologists examining the different tissues. When training on one patient but testing on another, AUROC in bladder is 0.79 ± 0.11 and in prostate is 0.96 ± 0.04. Our tool is available at



AJS was supported by NIH/NCI grant F31CA214029 and the Tri-Institutional Training Program in Computational Biology and Medicine (via NIH training grant T32GM083937). This research was funded in part through the NIH/NCI Cancer Center Support Grant P30CA008748. AJS thanks Terrie Wheeler, Du Cheng, and the Medical Student Executive Committee of Weill Cornell Medical College for free 3D printing access, instruction, and support. AJS thanks Mariam Aly for taking the photo of the camera on the orange 3D-printed mount in Fig. 1, and attention discussion. We acknowledge fair use of part of a doctor stick figure image in Fig. 1 from AJS thanks Mark Rubin for helpful pathology discussion. AJS thanks Paul Tatarsky and Juan Perin for Caffe install help on the Memorial Sloan Kettering supercomputer. We gratefully acknowledge NVIDIA Corporation for providing us a GPU as part of the GPU Research Center award to TJF, and for their support with other GPUs.


  1. 1.
    Ball, R., North, C.: The effects of peripheral vision and physical navigation on large scale visualization. In: Proceedings of Graphics Interface, pp. 9–16 (2008)Google Scholar
  2. 2.
    Ball, R., North, C., Bowman, D.: Move to improve: promoting physical navigation to increase user performance with large displays, pp. 191–200. ACM (2007)Google Scholar
  3. 3.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). doi: 10.1007/11744023_32 CrossRefGoogle Scholar
  4. 4.
    Begelman, G., Lifshits, M., Rivlin, E.: Visual positioning of previously defined ROIs on microscopic slides. IEEE Trans. Inf. Technol. Biomed. 10(1), 42–50 (2006)CrossRefGoogle Scholar
  5. 5.
    Brunye, T., Carney, P., Allison, K., Shapiro, L., Weaver, D., Elmore, J.: Eye movements as an index of pathologist visual expertise: a pilot study. PLoS ONE 9(8), e103447 (2014)CrossRefGoogle Scholar
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database, pp. 248–255. IEEE, June 2009Google Scholar
  7. 7.
    Eivazi, S., Bednarik, R., Leinonen, V., von und zu Fraunberg, M., Jaaskelainen, J.: Embedding an eye tracker into a surgical microscope: requirements, design, and implementation. IEEE Sens. J. 16(7), 2070–2078 (2016)CrossRefGoogle Scholar
  8. 8.
    Erwin, D.: The Interface of Language, Vision, and Action. Routledge, London (2004). doi: 10.4324/9780203488430 Google Scholar
  9. 9.
    Farneback, G.: Polynomial expansion for orientation and motion estimation. Ph.D. thesis, Linkoping University, Sweden (2002)Google Scholar
  10. 10.
    Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). doi: 10.1007/3-540-45103-X_50 CrossRefGoogle Scholar
  11. 11.
    Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Fuchs, T., Buhmann, J.: Computational pathology: challenges and promises for tissue analysis. Comput. Med. Imaging Graph. 35(7–8), 515–530 (2011). The official journal of the Computerized Medical Imaging SocietyCrossRefGoogle Scholar
  13. 13.
    Goode, A., Gilbert, B., Harkes, J., Jukic, D., Satyanarayanan, M.: OpenSlide: a vendor-neutral software foundation for digital pathology. J. Pathol. Inform. 4, 27 (2013)CrossRefGoogle Scholar
  14. 14.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding, June 2014Google Scholar
  15. 15.
    Just, M., Carpenter, P.: A theory of reading: from eye fixations to comprehension. Psychol. Rev. 87(4), 329–354 (1980)CrossRefGoogle Scholar
  16. 16.
    Keerativittayanun, S., Rakjaeng, K., Kondo, T., Kongprawechnon, W., Tungpimolrut, K., Leelasawassuk, T.: Eye tracking system for ophthalmic operating microscope, pp. 653–656. IEEE, August 2009Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks (2012)Google Scholar
  18. 18.
    Krupinski, E., Tillack, A., Richter, L., Henderson, J., Bhattacharyya, A., Scott, K., Graham, A., Descour, M., Davis, J., Weinstein, R.: Eye-movement study and human performance using telepathology virtual slides. Implications for medical education and differences with experience. Hum. Pathol. 37(12), 1543–1556 (2006)CrossRefGoogle Scholar
  19. 19.
    Mercan, E., Aksoy, S., Shapiro, L., Weaver, D., Brunye, T., Elmore, J.: Localization of diagnostically relevant regions of interest in whole slide images, pp. 1179–1184. IEEE, August 2014Google Scholar
  20. 20.
    Parwani, A., Hassell, L., Glassy, E., Pantanowitz, L.: Regulatory barriers surrounding the use of whole slide imaging in the United States of America. J. Pathol. Inform. 5(1) (2014)Google Scholar
  21. 21.
    Raghunath, V., Braxton, M., Gagnon, S., Brunye, T., Allison, K., Reisch, L., Weaver, D., Elmore, J., Shapiro, L.: Mouse cursor movement and eye tracking data as an indicator of pathologists’ attention when viewing digital whole slide images. J. Pathol. Inform. 3, 43 (2012)CrossRefGoogle Scholar
  22. 22.
    Randell, R., Ambepitiya, T., Mello-Thoms, C., Ruddle, R., Brettle, D., Thomas, R., Treanor, D.: Effect of display resolution on time to diagnosis with virtual pathology slides in a systematic search task. J. Digit. Imaging 28(1), 68–76 (2015)CrossRefGoogle Scholar
  23. 23.
    Romo, D., Romero, E., Gonzalez, F.: Learning regions of interest from low level maps in virtual microscopy. Diagn. Pathol. 6(Suppl 1), S22 (2011)CrossRefGoogle Scholar
  24. 24.
    Schneider, C., Rasband, W., Eliceiri, K.: NIH image to ImageJ: 25 years of image analysis. Nat. methods 9(7), 671–675 (2012)CrossRefGoogle Scholar
  25. 25.
    Shupp, L., Ball, R., Yost, B., Booker, J., North, C.: Evaluation of viewport size and curvature of large, high-resolution displays, pp. 123–130. Canadian Information Processing Society (2006)Google Scholar
  26. 26.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting, vol. 15, pp. 1929–1958, June 2014Google Scholar
  27. 27.
    Starkweather, G.: 58.4: DSHARP–a wide screen multi-projector display. SID Symp. Digest Tech. Pap. 34(1), 1535–1537 (2003)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Memorial Sloan Kettering Cancer Center and the Tri-Institutional Training Program in Computational Biology and MedicineNew YorkUSA
  2. 2.Weill Cornell Graduate School of Medical SciencesNew YorkUSA
  3. 3.Department of PathologyMemorial Sloan Kettering Cancer CenterNew YorkUSA
  4. 4.Department of Medical PhysicsMemorial Sloan Kettering Cancer CenterNew YorkUSA

Personalised recommendations