Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson’s automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson’s NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.
This is a preview of subscription content,to check access.
Access this article
Similar content being viewed by others
Patel TA, Puppala M, Ogunti RO, Ensor JE, He T, Shewale JB, Ankerst DP, Kaklamani VG, Rodriguez AA, Wong STC, Chang JC: Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer 123(1):114–121, 2016. https://doi.org/10.1002/cncr.30245
Töyräs J, Kröger H, Jurvelin JS: Bone properties as estimated by mineral density, ultrasound attenuation, and velocity. Bone 25(6):725–731, 1999. https://doi.org/10.1016/S8756-3282(99)00221-5
Tarver T: Cancer facts & figures 2012. American Cancer Society (ACS). J Consum Health Internet 16(3):366–367, 2012. https://doi.org/10.1080/15398285.2012.701177.
Bleyer A, Baines C, Miller AB: Impact of screening mammography on breast cancer mortality. Int J Cancer 138(8):2003–2012, 2016. https://doi.org/10.1002/ijc.29925
Lehman CD, Arao RF, Sprague BL, Lee JM, Buist DSM, Kerlikowske K, Henderson LM, Onega T, Tosteson ANA, Rauscher GH, Miglioretti DL: National Performance Benchmarks for modern screening digital mammography: Update from the Breast Cancer Surveillance Consortium. Radiology 283(1):49–58, 2017. https://doi.org/10.1148/radiol.2016161174
Kopans DB: An open letter to panels that are deciding guidelines for breast cancer screening. Breast Cancer Res Treat 151(1):19–25, 2015. https://doi.org/10.1007/s10549-015-3373-8
LeCun Y, Bengio Y, Hinton G: Deep learning. Nature 521(7553):436–444, 2015. https://doi.org/10.1038/nature14539
Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, Hulsbergen - van de Kaa C, Bult P, van Ginneken B, van der Laak J: Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep 6(1):26286, 2016. https://doi.org/10.1038/srep26286
Suk H-I, Lee SW, Shen D, Initi ADN: Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101:569–582, 2014. https://doi.org/10.1016/j.neuroimage.2014.06.077
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118, 2017. https://doi.org/10.1038/nature21056
Arfan M: Deep learning based computer aided diagnosis system for breast mammograms. Int J Adv Comput Sci Appl 8(7), 2017. https://doi.org/10.14569/IJACSA.2017.080738
Dhungel N, Carneiro G, Bradley AP: Deep structured learning for mass segmentation from mammograms. In: IEEE 2950–2954, 2015. https://doi.org/10.1109/ICIP.2015.7351343.
Dhungel N, Carneiro G, Bradley AP: Combining deep learning and structured prediction for segmenting masses in mammograms. In: Deep Learning and Convolutional Neural Networks for Medical Image Computing. Vol 58. Advances in Computer Vision and Pattern Recognition. Cham: Springer International Publishing 225–240, 2017. https://doi.org/10.1007/978-3-319-42999-1_13.
Wang J, Yang Y: A context-sensitive deep learning approach for microcalcification detection in mammograms. Pattern Recogn 78:12–22, 2018. https://doi.org/10.1016/j.patcog.2018.01.009
Dhungel N, Carneiro G, Bradley AP: A deep learning approach for the analysis of masses in mammograms with minimal user intervention. Med Image Anal 37:114–128, 2017. https://doi.org/10.1016/j.media.2017.01.009
Greenspan H, van Ginneken B, Summers RM: Deep learning in medical imaging: overview and future promise of an exciting new technique. arXiv. 35(5):1153–1159, 2016. https://doi.org/10.1109/TMI.2016.2553401
He K, Zhang X, Ren S, Sun J: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE 1026–1034, 2015. https://doi.org/10.1109/ICCV.2015.123
Nguyen H, Patrick J: Text mining in clinical domain. New York: ACM Press; 2016:549–558. https://doi.org/10.1145/2939672.2939720.
Rodríguez-González A: Extracting diagnostic knowledge from MedLine plus: a comparison between MetaMap and cTAKES approaches. Curr Bioinforma 12:1–11, 2017. https://doi.org/10.2174/1574893612666170727094502
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17(5):507–513, 2010. https://doi.org/10.1136/jamia.2009.001560
Garla V, Re, III VL, Dorey-Stein Z, Kidwai F, Scotch M, Womack J, Justice A, Brandt C: The Yale cTAKES extensions for document classification: architecture and application. J Am Med Inform Assoc 18(5):614–620, 2011. https://doi.org/10.1136/amiajnl-2011-000093
Pons E, Braun LMM, Hunink MGM, Kors JA: Natural language processing in radiology: a systematic review. Radiology 279(2):329–343, 2016. https://doi.org/10.1148/radiol.16142770
Frank E, Hall M, Holmes G, Kirkby R, Pfahringer B, Witten IH, Trigg L: Weka-a machine learning workbench for data mining. In: Data Mining and Knowledge Discovery Handbook. Boston: Springer US:1269–1277, 2010. https://doi.org/10.1007/978-0-387-09823-4_66
Holmes G, Donkin A, Witten IH: WEKA: a machine learning workbench. In: IEEE:357–361. https://doi.org/10.1109/ANZIIS.1994.396988
Ferrucci D, Levas A, Bagchi S, Gondek D, Mueller ET: Watson: beyond jeopardy! Artif Intell 199:93–105, 2013. https://doi.org/10.1016/j.artint.2012.06.009
Brown E. Watson: the jeopardy! Challenge and beyond. In: IEEE: 2–2, 2013. https://doi.org/10.1109/ICCI-CC.2013.6622216
Trivedi H, Mesterhazy J, Laguna B, Vu T, Sohn JH: Automatic determination of the need for intravenous contrast in musculoskeletal MRI examinations using IBM Watson’s natural language processing algorithm. J Digit Imaging 11(5):245–251, 2017. https://doi.org/10.1007/s10278-017-0021-3
DH and MP are supported by the National Cancer Institute of the NIH under Award Number UH2CA203792, and the National Library of Medicine of the NIH under Award Number 1U01LM012675. HT was supported by NIH T32 Grant 5T32EB001631-10. AL was supported by the UCSF RAPtr fund.
Conflict of Interest
The authors declare that they have no conflict of interest.
About this article
Cite this article
Trivedi, H.M., Panahiazar, M., Liang, A. et al. Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning. J Digit Imaging 32, 30–37 (2019). https://doi.org/10.1007/s10278-018-0105-8