Abstract
Objectives
To create an advanced image retrieval and data-mining system based on in-house radiology reports.
Methods
Radiology reports are semantically analysed using natural language processing (NLP) techniques and stored in a state-of-the-art search engine. Images referenced by sequence and image number in the reports are retrieved from the picture archiving and communication system (PACS) and stored for later viewing. A web-based front end is used as an interface to query for images and show the results with the retrieved images and report text. Using a comprehensive radiological lexicon for the underlying terminology, the search algorithm also finds results for synonyms, abbreviations and related topics.
Results
The test set was 108 manually annotated reports analysed by different system configurations. Best results were achieved using full syntactic and semantic analysis with a precision of 0.929 and recall of 0.952. Operating successfully since October 2010, 258,824 reports have been indexed and a total of 405,146 preview images are stored in the database.
Conclusions
Data-mining and NLP techniques provide quick access to a vast repository of images and radiology reports with both high precision and recall values. Consequently, the system has become a valuable tool in daily clinical routine, education and research.
Key Points
• Radiology reports can now be analysed using sophisticated natural language-processing techniques.
• Semantic text analysis is backed by terminology of a radiological lexicon.
• The search engine includes results for synonyms, abbreviations and compositions.
• Key images are automatically extracted from radiology reports and fetched from PACS.
• Such systems help to find diagnoses, improve report quality and save time.
Similar content being viewed by others
Abbreviations
- CT:
-
Computed tomography
- DICOM:
-
Digital Imaging and Communications in Medicine
- HIS:
-
Hospital information system
- HL7:
-
Health Level 7
- HTTP:
-
Hypertext transfer protocol
- JPEG:
-
Joint Photographic Experts Group
- JSON:
-
JavaScript Object Notation
- MRI:
-
Magnetic resonance imaging
- NLP:
-
Natural language processing
- PACS:
-
Picture archiving and communication system
- RDF:
-
Resource description framework
- RIS:
-
Radiology information system
References
Noumeir R (2006) Benefits of the DICOM structured report. J Digit Imaging 19:295–306
Ramaswamy MR, Patterson DS, Yin L, Goodacre BW (1996) MoSearch: a radiologist-friendly tool for finding-based diagnostic report and image retrieval. Radiographics 16:923–933
Dreyer KJ, Kalra MK, Maher MM et al (2005) Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology 234:323–329
Friedman C, Alderson PO, Austin JHM, Cimino JJ, Johnson SB (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1:161–174
Hripcsak G, Friedman C, Alderson PO et al (1995) Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med 122:681–688
Do BH, Wu A, Biswal S, Kamaya A, Rubin DL (2010) Informatics in radiology: RADTF: a semantic search-enabled, natural language processor-generated radiology teaching file. Radiographics 30:2039–2048
Mendonça EA, Haas J, Shagina L, Larson E, Friedman C (2005) Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform 38:314–321
Schulz S, Daumke P, Fischer P, Müller M, Müller ML (2008) Evaluation of a document search engine in a clinical department system. AMIA Annu Symp Proc 647–651
Dang PA, Kalra MK, Schultz TJ, Graham SA, Dreyer KJ (2009) Informatics in radiology: Render: an online searchable radiology study repository. Radiographics 29:1233–1246
Erinjeri JP, Picus D, Prior FW, Rubin DA, Koppel P (2008) Development of a Google-based search engine for data mining radiology reports. J Digit Imaging 22:348–356
Apache UIMA. Apache Software Foundation. Available via http://uima.apache.org/. Accessed September 29, 2011
Apache OpenNLP. Apache Software Foundation. Available via http://incubator.apache.org/opennlp/. Accessed September 29, 2011
Wermter J, Hahn U (2004) An Annotated German-Language Medical Text Corpus as Language Resource, Presented at the International Conference on Language Resources and Evaluation
Porter MF (1980) An algorithm for suffix stripping. Program 14:130–137
Markó K, Schulz S, Medelyan O, Hahn U (2005) Bootstrapping Dictionaries for Cross-Language Information Retrieval, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 528–535, Salvador, Brazil
Markó K, Schulz S, Hahn U (2005) MorphoSaurus—design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain. Methods Inf Med 44:537–545
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34:301–310
Huang Y, Lowe HJ (2005) A grammar-based classification of negations in clinical radiology reports. AMIA Annu Symp Proc 2005:988–988
Huang Y, Lowe HJ (2007) A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc 14:304–311
Wu AS, Do BH, Kim J, Rubin DL (2009) Evaluation of negation and uncertainty detection and its impact on precision and recall in search. J Digit Imaging 24:234–242
DICOM Standards Committee. Supplement 59: Key Object Selection Document SOP Class. Available via ftp://medical.nema.org/medical/dicom/final/sup59_ft.pdf. Accessed September 29, 2011
dcm4che.org. Open Source Clinical Image and Object Management. Available via http://www.dcm4che.org/. Accessed September 29, 2011
DCMTK – DICOM-Toolkit. OFFIS. Available via http://dicom.offis.de/dcmtk.php.de. Accessed September 29, 2011
Tanenblatt M, Coden A, Sominsky I (2010) The ConceptMapper Approach to Named Entity Recognition, Presented at the International Conference on Language Resources and Evaluation
Rector AL (1999) Clinical terminology: why is it so hard? Methods Inf Med 38:239–252
Scientific Linux. Available via https://www.scientificlinux.org/. Accessed June 7, 2012
The Apache HTTP Server Project. Apache Software Foundation. Available via http://httpd.apache.org/. Accessed June 7, 2012
MySQL :: The world’s most popular open source database. Available via http://www.mysql.com/. Accessed June 7, 2012
Apache Lucene. Apache Software Foundation. Available via http://lucene.apache.org/core/. Accessed September 29, 2011
Lim CCT, Yang GL, Nowinski WL, Hui F (2003) Medical image resource center—making electronic teaching files from PACS. J Digit Imaging 16:331–336
Kahn CE, Thao C (2007) GoldMiner: a radiology image search engine. Am J Roentgenol 188:1475–1478
Ekins J (2007) What is STATdx. S Afr J Radiol 11:110–111
Savova GK, Masanz JJ, Ogren PV et al (2010) Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17:507–513
Eisenhauer EA, Therasse P, Bogaerts J et al (2009) New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 45:228–247
Rubin DL, Desser TS (2008) A data warehouse for integrating radiologic and pathologic data. J Am Coll Radiol 5:210–217
Wong STC, Hoo KS Jr, Cao X et al (2004) A neuroinformatics database system for disease-oriented neuroimaging research. Acad Radiol 11:345–358
Acknowledgments
This work was supported by the Theseus Program of the German Federal Ministry of Economics and Technology. Philipp Daumke and Kai Simon work for Averbis GmbH, Freiburg, Germany. We thank Sören Holste for his assistance in evaluating the NLP pipeline.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gerstmair, A., Daumke, P., Simon, K. et al. Intelligent image retrieval based on radiology reports. Eur Radiol 22, 2750–2758 (2012). https://doi.org/10.1007/s00330-012-2608-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00330-012-2608-x