Skip to main content
Log in

Intelligent image retrieval based on radiology reports

  • Computer Applications
  • Published:
European Radiology Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Objectives

To create an advanced image retrieval and data-mining system based on in-house radiology reports.

Methods

Radiology reports are semantically analysed using natural language processing (NLP) techniques and stored in a state-of-the-art search engine. Images referenced by sequence and image number in the reports are retrieved from the picture archiving and communication system (PACS) and stored for later viewing. A web-based front end is used as an interface to query for images and show the results with the retrieved images and report text. Using a comprehensive radiological lexicon for the underlying terminology, the search algorithm also finds results for synonyms, abbreviations and related topics.

Results

The test set was 108 manually annotated reports analysed by different system configurations. Best results were achieved using full syntactic and semantic analysis with a precision of 0.929 and recall of 0.952. Operating successfully since October 2010, 258,824 reports have been indexed and a total of 405,146 preview images are stored in the database.

Conclusions

Data-mining and NLP techniques provide quick access to a vast repository of images and radiology reports with both high precision and recall values. Consequently, the system has become a valuable tool in daily clinical routine, education and research.

Key Points

Radiology reports can now be analysed using sophisticated natural language-processing techniques.

Semantic text analysis is backed by terminology of a radiological lexicon.

The search engine includes results for synonyms, abbreviations and compositions.

Key images are automatically extracted from radiology reports and fetched from PACS.

Such systems help to find diagnoses, improve report quality and save time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

CT:

Computed tomography

DICOM:

Digital Imaging and Communications in Medicine

HIS:

Hospital information system

HL7:

Health Level 7

HTTP:

Hypertext transfer protocol

JPEG:

Joint Photographic Experts Group

JSON:

JavaScript Object Notation

MRI:

Magnetic resonance imaging

NLP:

Natural language processing

PACS:

Picture archiving and communication system

RDF:

Resource description framework

RIS:

Radiology information system

References

  1. Noumeir R (2006) Benefits of the DICOM structured report. J Digit Imaging 19:295–306

    Article  PubMed  Google Scholar 

  2. Ramaswamy MR, Patterson DS, Yin L, Goodacre BW (1996) MoSearch: a radiologist-friendly tool for finding-based diagnostic report and image retrieval. Radiographics 16:923–933

    PubMed  CAS  Google Scholar 

  3. Dreyer KJ, Kalra MK, Maher MM et al (2005) Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology 234:323–329

    Article  PubMed  Google Scholar 

  4. Friedman C, Alderson PO, Austin JHM, Cimino JJ, Johnson SB (1994) A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1:161–174

    Article  PubMed  CAS  Google Scholar 

  5. Hripcsak G, Friedman C, Alderson PO et al (1995) Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med 122:681–688

    PubMed  CAS  Google Scholar 

  6. Do BH, Wu A, Biswal S, Kamaya A, Rubin DL (2010) Informatics in radiology: RADTF: a semantic search-enabled, natural language processor-generated radiology teaching file. Radiographics 30:2039–2048

    Article  PubMed  Google Scholar 

  7. Mendonça EA, Haas J, Shagina L, Larson E, Friedman C (2005) Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform 38:314–321

    Article  PubMed  Google Scholar 

  8. Schulz S, Daumke P, Fischer P, Müller M, Müller ML (2008) Evaluation of a document search engine in a clinical department system. AMIA Annu Symp Proc 647–651

  9. Dang PA, Kalra MK, Schultz TJ, Graham SA, Dreyer KJ (2009) Informatics in radiology: Render: an online searchable radiology study repository. Radiographics 29:1233–1246

    Article  PubMed  Google Scholar 

  10. Erinjeri JP, Picus D, Prior FW, Rubin DA, Koppel P (2008) Development of a Google-based search engine for data mining radiology reports. J Digit Imaging 22:348–356

    Article  PubMed  Google Scholar 

  11. Apache UIMA. Apache Software Foundation. Available via http://uima.apache.org/. Accessed September 29, 2011

  12. Apache OpenNLP. Apache Software Foundation. Available via http://incubator.apache.org/opennlp/. Accessed September 29, 2011

  13. Wermter J, Hahn U (2004) An Annotated German-Language Medical Text Corpus as Language Resource, Presented at the International Conference on Language Resources and Evaluation

  14. Porter MF (1980) An algorithm for suffix stripping. Program 14:130–137

    Article  Google Scholar 

  15. Markó K, Schulz S, Medelyan O, Hahn U (2005) Bootstrapping Dictionaries for Cross-Language Information Retrieval, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 528–535, Salvador, Brazil

  16. Markó K, Schulz S, Hahn U (2005) MorphoSaurus—design and evaluation of an interlingua-based, cross-language document retrieval engine for the medical domain. Methods Inf Med 44:537–545

    PubMed  Google Scholar 

  17. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34:301–310

    Article  PubMed  CAS  Google Scholar 

  18. Huang Y, Lowe HJ (2005) A grammar-based classification of negations in clinical radiology reports. AMIA Annu Symp Proc 2005:988–988

    Google Scholar 

  19. Huang Y, Lowe HJ (2007) A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc 14:304–311

    Article  PubMed  Google Scholar 

  20. Wu AS, Do BH, Kim J, Rubin DL (2009) Evaluation of negation and uncertainty detection and its impact on precision and recall in search. J Digit Imaging 24:234–242

    Article  PubMed  Google Scholar 

  21. DICOM Standards Committee. Supplement 59: Key Object Selection Document SOP Class. Available via ftp://medical.nema.org/medical/dicom/final/sup59_ft.pdf. Accessed September 29, 2011

  22. dcm4che.org. Open Source Clinical Image and Object Management. Available via http://www.dcm4che.org/. Accessed September 29, 2011

  23. DCMTK – DICOM-Toolkit. OFFIS. Available via http://dicom.offis.de/dcmtk.php.de. Accessed September 29, 2011

  24. Tanenblatt M, Coden A, Sominsky I (2010) The ConceptMapper Approach to Named Entity Recognition, Presented at the International Conference on Language Resources and Evaluation

  25. Rector AL (1999) Clinical terminology: why is it so hard? Methods Inf Med 38:239–252

    PubMed  CAS  Google Scholar 

  26. Scientific Linux. Available via https://www.scientificlinux.org/. Accessed June 7, 2012

  27. The Apache HTTP Server Project. Apache Software Foundation. Available via http://httpd.apache.org/. Accessed June 7, 2012

  28. MySQL :: The world’s most popular open source database. Available via http://www.mysql.com/. Accessed June 7, 2012

  29. Apache Lucene. Apache Software Foundation. Available via http://lucene.apache.org/core/. Accessed September 29, 2011

  30. Lim CCT, Yang GL, Nowinski WL, Hui F (2003) Medical image resource center—making electronic teaching files from PACS. J Digit Imaging 16:331–336

    Article  PubMed  Google Scholar 

  31. Kahn CE, Thao C (2007) GoldMiner: a radiology image search engine. Am J Roentgenol 188:1475–1478

    Article  Google Scholar 

  32. Ekins J (2007) What is STATdx. S Afr J Radiol 11:110–111

    Google Scholar 

  33. Savova GK, Masanz JJ, Ogren PV et al (2010) Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17:507–513

    Article  PubMed  Google Scholar 

  34. Eisenhauer EA, Therasse P, Bogaerts J et al (2009) New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 45:228–247

    Article  PubMed  CAS  Google Scholar 

  35. Rubin DL, Desser TS (2008) A data warehouse for integrating radiologic and pathologic data. J Am Coll Radiol 5:210–217

    Article  PubMed  Google Scholar 

  36. Wong STC, Hoo KS Jr, Cao X et al (2004) A neuroinformatics database system for disease-oriented neuroimaging research. Acad Radiol 11:345–358

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This work was supported by the Theseus Program of the German Federal Ministry of Economics and Technology. Philipp Daumke and Kai Simon work for Averbis GmbH, Freiburg, Germany. We thank Sören Holste for his assistance in evaluating the NLP pipeline.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Axel Gerstmair.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gerstmair, A., Daumke, P., Simon, K. et al. Intelligent image retrieval based on radiology reports. Eur Radiol 22, 2750–2758 (2012). https://doi.org/10.1007/s00330-012-2608-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00330-012-2608-x

Keywords

Navigation