Abstract
Introduce the notion of cross-sectional relatedness as an informational dependence relation between sentences in the conclusion section of a breast radiology report and sentences in the findings section of the same report. Assess inter-rater agreement of breast radiologists. Develop and evaluate a support vector machine (SVM) classifier for automatically detecting cross-sectional relatedness. A standard reference is manually created from 444 breast radiology reports by the first author. A subset of 37 reports is annotated by five breast radiologists. Inter-rater agreement is computed among their annotations and standard reference. Thirteen numerical features are developed to characterize pairs of sentences; the optimal feature set is sought through forward selection. Inter-rater agreement is F-measure 0.623. SVM classifier has F-measure of 0.699 in the 12-fold cross-validation protocol against standard reference. Report length does not correlate with the classifier’s performance (correlation coefficient = −0.073). SVM classifier has average F-measure of 0.505 against annotations by breast radiologists. Mediocre inter-rater agreement is possibly caused by: (1) definition is insufficiently actionable, (2) fine-grained nature of cross-sectional relatedness on sentence level, instead of, for instance, on paragraph level, and (3) higher-than-average complexity of 37-report sample. SVM classifier performs better against standard reference than against breast radiologists’s annotations. This is supportive of (3). SVM’s performance on standard reference is satisfactory. Since optimal feature set is not breast specific, results may transfer to non-breast anatomies. Applications include a smart report viewing environment and data mining.
Similar content being viewed by others
References
American College of Radiology: Breast Imaging Reporting and Data System Atlas. American College of Radiology, Reston, 2003
Reiner BI: Customization of medical report data. J Digit Imaging 23(4):363–73, 2010
Gershanik EF, Lacson R, Khorasani R: Critical finding capture in the impression section of radiology reports. AMIA Annu Symp 2011:465–9, 2011
Friedman C, Johnson SB: Natural language and text processing in biomedicine. In: Shortliffe EH, Cimino JJ Eds. Biomedical informatics; computer applications in health care and medicine. Springer, New York, 2006, pp 312–43
Friedman C, Alderson PO, Austin JHM, Cimino JJ, Johnson SB: A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1:161–74, 1994
Friedman C, Hripcsak G, Shagina L, Liu H: Representing information in patient reports using natural language processing and the extensible markup language. J Am Med Inform Assoc 6:76–87, 1999
Dreyer KJ, Kalra MK, Maher MM, Hurier AM, Asfaw BA, Schultz T, Halpern EF, Thrall JH: Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology 234(2):323–29, 2005
Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17:507–13, 2010
Jain NL, Knirsch CA, Friedman C, Hripcsak G: Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. Proc AMIA Annu Fall Symp, 1996, pp 542–46
Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C: Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc 15(1):87–98, 2008
Dang PA, Kalra MK, Blake MA, Schultz TJ, Stout M, Halpern EF, Dreyer KJ: Use of Radcube for extraction of finding trends in a large radiology practice. J Digit Imaging 22(6):629–40, 2009
Sevenster M, van Ommering R, Qian Y: Automatically correlating clinical findings and body locations in radiology reports using MedLEE. J Digit Imaging 25:240–9, 2012
Chang C, Lin C: LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27, 2011
Kudo T, Matsumoto Y: Chunking with support vector machines. In: Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, 2001, pp 1–8
Manning CD, Raghavan P, Schuetze H: Introduction to information retrieval. Cambridge University Press, Cambridge, 2008
Hripcsak G, Rothschild AS: Agreement, the F-measure, and reliability in information retrieval. J Am Med Inform Assoc 12(3):296–98, 2005
Varma S, Simon R: Bias in error estimation when using cross-validation for model selection. BMC Bioinforma 7:91, 2006
Salzberg SL: On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov 1:317–27, 1997
Dagan I, Glickman O, Magnini B: The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science, vol. 3944. Berlin: Springer, 2006, pp 177–190
Bar-Haim R, Dagan I, Dolan B, Ferro L, Giampiccolo D, Magnini B, Szpektor I: The second PASCAL recognising textual entailment challenge. Proc PASCAL RTE-2 Chall 3944:177–90, 2005
Pakray P, Bandyopadhyay S, Gelbukh A: Textual entailment using lexical and syntactic similarity. Int J Artif Intell Appl 2(1):43–58, 2011
Bayer S, Burger J, Ferro L, Henderson J, Yeh A: MITRE’s Submissions to the EU Pascal RTE Challenge. Proc PASCAL RTE-1 Challenge, 2005, pp 41–44
Tatu M, Iles B, Slavik J, Novischi A, Moldovan D: COGEX at the Second Recognizing Textual Entailment Challenge. Proc. of the PAS-CAL RTE-2 Challenge, 2006
Tatu M, Moldovan D: COGEX at RTE3. RTE ’07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, 2007, pp 22–27
Hickl A, Williams J, Bensley J, Roberts K, Rink B, Shi Y: Recognizing textual entailment with LCC’s GROUNDHOG system. Proc. of the PAS-CAL RTE-2 Challenge, 2006
Hickl A, BensleyJ: A Discourse Commitment-Based Framework for Recognizing Textual Entailment. Proceedings of the Workshop on Textual Entailment and Paraphrasing, 2007
Li H, Hu Y, Li Z, Wan X, Xiao J: PKUTM participation in TAC2011. Proceeding RTE ’07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, 2010
Jijkoun V, De Rijke M: Recognizing textual entailment using lexical similarity. Proceedings Pascal 2005 Textual Entailment Challenge Workshop, 2005
Burnside ES, Davis J, Costa VS, Dutra IDC, Kahn CE, Fine J, Page D: Knowledge discovery from structured mammography reports using inductive logic programming. AMIA Ann Symposium, 2005, pp 96–100
Wang R, Neumann G: Recognizing Textual Entailment Using Sentence Similarity based on Dependency Tree Skeletons. Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, 2007, pp 36–41
Aronson AR, Lang FM: An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17(3):229–36, 2010
Goldstein I: Automated classification of the narrative of medical reports using natural language processing. University at Albany, State University of New York, 2011
Acknowledgments
The authors gratefully acknowledge Yassine Benajiba, Steffen Pauws, and the anonymous referees for valuable comments on an earlier version of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sevenster, M., Qian, Y., Abe, H. et al. Cross-Sectional Relatedness Between Sentences in Breast Radiology Reports: Development of an SVM Classifier and Evaluation Against Annotations of Five Breast Radiologists. J Digit Imaging 26, 977–988 (2013). https://doi.org/10.1007/s10278-013-9612-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10278-013-9612-9