Skip to main content
Log in

Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning

Journal of Digital Imaging Aims and scope Submit manuscript

Cite this article


Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson’s automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson’s NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others


  1. Patel TA, Puppala M, Ogunti RO, Ensor JE, He T, Shewale JB, Ankerst DP, Kaklamani VG, Rodriguez AA, Wong STC, Chang JC: Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer 123(1):114–121, 2016.

    Article  PubMed  Google Scholar 

  2. Töyräs J, Kröger H, Jurvelin JS: Bone properties as estimated by mineral density, ultrasound attenuation, and velocity. Bone 25(6):725–731, 1999.

    Article  PubMed  Google Scholar 

  3. Tarver T: Cancer facts & figures 2012. American Cancer Society (ACS). J Consum Health Internet 16(3):366–367, 2012.

    Article  Google Scholar 

  4. Bleyer A, Baines C, Miller AB: Impact of screening mammography on breast cancer mortality. Int J Cancer 138(8):2003–2012, 2016.

    Article  CAS  PubMed  Google Scholar 

  5. Lehman CD, Arao RF, Sprague BL, Lee JM, Buist DSM, Kerlikowske K, Henderson LM, Onega T, Tosteson ANA, Rauscher GH, Miglioretti DL: National Performance Benchmarks for modern screening digital mammography: Update from the Breast Cancer Surveillance Consortium. Radiology 283(1):49–58, 2017.

    Article  PubMed  Google Scholar 

  6. Kopans DB: An open letter to panels that are deciding guidelines for breast cancer screening. Breast Cancer Res Treat 151(1):19–25, 2015.

    Article  PubMed  Google Scholar 

  7. LeCun Y, Bengio Y, Hinton G: Deep learning. Nature 521(7553):436–444, 2015.

    Article  CAS  PubMed  Google Scholar 

  8. Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, Hulsbergen - van de Kaa C, Bult P, van Ginneken B, van der Laak J: Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep 6(1):26286, 2016.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Suk H-I, Lee SW, Shen D, Initi ADN: Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101:569–582, 2014.

    Article  PubMed  Google Scholar 

  10. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118, 2017.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Arfan M: Deep learning based computer aided diagnosis system for breast mammograms. Int J Adv Comput Sci Appl 8(7), 2017.

  12. Dhungel N, Carneiro G, Bradley AP: Deep structured learning for mass segmentation from mammograms. In: IEEE 2950–2954, 2015.

  13. Dhungel N, Carneiro G, Bradley AP: Combining deep learning and structured prediction for segmenting masses in mammograms. In: Deep Learning and Convolutional Neural Networks for Medical Image Computing. Vol 58. Advances in Computer Vision and Pattern Recognition. Cham: Springer International Publishing 225–240, 2017.

  14. Wang J, Yang Y: A context-sensitive deep learning approach for microcalcification detection in mammograms. Pattern Recogn 78:12–22, 2018.

    Article  Google Scholar 

  15. Dhungel N, Carneiro G, Bradley AP: A deep learning approach for the analysis of masses in mammograms with minimal user intervention. Med Image Anal 37:114–128, 2017.

    Article  PubMed  Google Scholar 

  16. Greenspan H, van Ginneken B, Summers RM: Deep learning in medical imaging: overview and future promise of an exciting new technique. arXiv. 35(5):1153–1159, 2016.

  17. He K, Zhang X, Ren S, Sun J: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE 1026–1034, 2015.

  18. Nguyen H, Patrick J: Text mining in clinical domain. New York: ACM Press; 2016:549–558.

    Book  Google Scholar 

  19. Rodríguez-González A: Extracting diagnostic knowledge from MedLine plus: a comparison between MetaMap and cTAKES approaches. Curr Bioinforma 12:1–11, 2017.

    Article  CAS  Google Scholar 

  20. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17(5):507–513, 2010.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Garla V, Re, III VL, Dorey-Stein Z, Kidwai F, Scotch M, Womack J, Justice A, Brandt C: The Yale cTAKES extensions for document classification: architecture and application. J Am Med Inform Assoc 18(5):614–620, 2011.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Pons E, Braun LMM, Hunink MGM, Kors JA: Natural language processing in radiology: a systematic review. Radiology 279(2):329–343, 2016.

    Article  PubMed  Google Scholar 

  23. Frank E, Hall M, Holmes G, Kirkby R, Pfahringer B, Witten IH, Trigg L: Weka-a machine learning workbench for data mining. In: Data Mining and Knowledge Discovery Handbook. Boston: Springer US:1269–1277, 2010.

  24. Holmes G, Donkin A, Witten IH: WEKA: a machine learning workbench. In: IEEE:357–361.

  25. Ferrucci D, Levas A, Bagchi S, Gondek D, Mueller ET: Watson: beyond jeopardy! Artif Intell 199:93–105, 2013.

    Article  Google Scholar 

  26. Brown E. Watson: the jeopardy! Challenge and beyond. In: IEEE: 2–2, 2013.

  27. Trivedi H, Mesterhazy J, Laguna B, Vu T, Sohn JH: Automatic determination of the need for intravenous contrast in musculoskeletal MRI examinations using IBM Watson’s natural language processing algorithm. J Digit Imaging 11(5):245–251, 2017.

    Article  Google Scholar 

Download references


DH and MP are supported by the National Cancer Institute of the NIH under Award Number UH2CA203792, and the National Library of Medicine of the NIH under Award Number 1U01LM012675. HT was supported by NIH T32 Grant 5T32EB001631-10. AL was supported by the UCSF RAPtr fund.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hari M. Trivedi.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Trivedi, H.M., Panahiazar, M., Liang, A. et al. Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning. J Digit Imaging 32, 30–37 (2019).

Download citation

  • Published:

  • Issue Date:

  • DOI: