Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification
- 71 Downloads
While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adverse outcomes related to missed or delayed cancer diagnosis. In this study, we develop an algorithm to automatically detect free text radiology reports that have a follow-up recommendation using natural language processing (NLP) techniques and machine learning models. The data set used in this study consists of 6000 free text reports from the author’s institution. NLP techniques are used to engineer 1500 features, which include the most informative unigrams, bigrams, and trigrams in the training corpus after performing tokenization and Porter stemming. On this data set, we train naive Bayes, decision tree, and maximum entropy models. The decision tree model, with an F1 score of 0.458 and accuracy of 0.862, outperforms both the naive Bayes (F1 score of 0.381) and maximum entropy (F1 score of 0.387) models. The models were analyzed to determine predictive features, with term frequency of n-grams such as “renal neoplasm” and “evalu with enhanc” being most predictive of a follow-up recommendation. Key to maximizing performance was feature engineering that extracts predictive information and appropriate selection of machine learning algorithms based on the feature set.
KeywordsArtificial intelligence Binary classification Follow-up Machine learning Natural language processing Structured reporting
- 2.Yetisgen-Yildiz M, Gunn ML, Xia F, Payne TH: Automatic identification of critical follow-up recommendation sentences in radiology reports. AMIA Annu Symp Proc 2011:1593–1602, 2011Google Scholar
- 10.The Porter Stemming Algorithm. https://tartarus.org/martin/PorterStemmer/. Accessed June 1, 2017.
- 12.MEGA Model Optimization Package. http://legacydirs.umiacs.umd.edu/~hal/megam/version0_3/. Accessed June 1, 2017.
- 18.Bird, Steven, Ewan, Klein, and Loper, Edward (2009), Natural language processing with Python, O’Reilly Media..Google Scholar