From data to insights: how natural language processing and structured reporting advance data-driven radiology

Fink, Matthias A.

doi:10.1007/s00330-023-10242-w

From data to insights: how natural language processing and structured reporting advance data-driven radiology

Commentary
Open access
Published: 02 October 2023

Volume 33, pages 7494–7495, (2023)
Cite this article

Download PDF

You have full access to this open access article

European Radiology Aims and scope Submit manuscript

From data to insights: how natural language processing and structured reporting advance data-driven radiology

Download PDF

Matthias A. Fink ORCID: orcid.org/0000-0002-0189-7070^1,2

2709 Accesses
1 Altmetric
Explore all metrics

The Original Article was published on 05 August 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The past year has seen the most visible advances in what is known as natural language processing (NLP), the branch of artificial intelligence (AI) that focuses on how machines can understand and generate language like humans do. The current hype around NLP in general is largely due to the recent emergence of large language models (LLMs), which now allow anyone without technical expertise to engage with the most advanced NLP algorithms using an easy-to-use browser-based interface. Over the last decade, the progress of NLP has been greatly accelerated by the increasing computing power of hardware and the associated development of deep learning techniques such as the transformer architecture used in the existing generative pretrained transformer (GPT)–based LLMs [1]. Given the rapid growth of digital data and the increasing need for automated language processing, NLP has become an indispensable technology in various industries—not only in healthcare, but also in finance, education and marketing.

However, it is important to note that LLMs, as the latest generation of language processing models, are only one pillar of an interdisciplinary research field dedicated to the development and application of NLP algorithms [2]. These include invaluable tools for processing, aggregating and simplifying text corpora, and depending on the underlying NLP problem, a variety of methods from a broad NLP toolbox can be selected to facilitate its application for the benefit of radiology. Radiologists who understand the potential and limitations of NLP will be better equipped to evaluate NLP models, understand how they can improve clinical workflow and facilitate research efforts involving large amounts of textual data.

Current NLP methods used for text processing and analysis in radiology range from traditional rule-based systems (e.g. string matching [3]) to feature-rich learners (e.g. conditional random fields, support vector machines [4]) and deep learning methods such as convolutional and transformer-based neural networks [5], including cutting-edge LLMs [6]. A well-established NLP technique is Word2Vec, developed by Google in 2013, which uses a two-layer neural network to learn word associations from a large corpus of text without additional user input [7]. Once trained, such a model can detect synonyms or suggest additional words for a partial sentence. The vectors produced by Word2Vec capture the semantic and syntactic qualities of words, allowing for the measurement of their semantic similarity.

In their study, Vosshenrich et al [8] used Doc2Vec, an NLP algorithm that generalises Word2Vec to encode whole documents rather than individual words. The resulting vector representations encapsulate various relationships between documents and may prove useful in tasks such as quantifying similarities or differences between text corpora. The authors conducted a retrospective analysis of a large cohort of nearly 750,000 institutional radiology reports over a 10-year period, which were processed by Doc2Vec and converted into multidimensional vectors. The dimensionality of the vectors was then reduced to 2D data using a non-linear dimensionality reduction method known as t-distributed stochastic neighbour embedding (t-SNE) to facilitate visualisation and statistical analysis between two different types of radiology reports: free-text and (semi)structured reports. Based on the analysed differences in the spread and centroids of the document vectors, the authors found that structured reports had higher linguistic similarity and better linguistic discrimination compared to free-text reports.

While these results may not be surprising given the underlying transition from prose text to consistent reporting organisation and terminology, they do provide quantifiable evidence that structured reporting can improve the standardisation and distinguishability of reporting language in radiology, which could also facilitate automated data postprocessing.

The choice of reporting style has major implications, as it conveys the essence of the radiologist’s interpretation of medical images and thus contributes key diagnostic information to the therapeutic decision-making process. The clarity and completeness of the radiology report, as the primary means of communication between clinicians and between clinicians and patients, can therefore have a significant impact on both the transmission of diagnostic information and the quality of report translation into machine-readable data. Transforming radiology data into a digital stream of structured diagnostic information and quantitative image data, supported by tailored NLP extraction methods, would facilitate integration with other diagnostic modalities and promote data-driven health care, integrated diagnostics and visual grounding approaches in radiology at scale. However, developing reliable NLP pipelines to retrieve key information and feed downstream processing, such as weakly supervised learning frameworks for AI model building, integration with aggregated clinical data or correlation with imaging biomarkers, will become easier as we structure radiology reporting and procedures during primary data collection. By moving towards structured reporting in clinical practice, and leveraging rapid advances in NLP technology, we will be able to unlock the full potential of data in our field. This will ultimately lead to system improvements for patients, clinicians and institutions, improving the quality of care.

References

Vaswani A, Shazeer N, Parmar N, et al (2023) Attention is all you need. http://arxiv.org/abs/1706.03762. Accessed 30 Aug 2023
Chowdhary KR (2020) Natural language processing. In: Chowdhary KR (ed) Fundamentals of Artificial Intelligence. Springer India, New Delhi, pp 603–649
Fink MA, Mayer VL, Schneider T et al (2022) CT angiography clot burden score from data mining of structured reports for pulmonary embolism. Radiology 302:175–184. https://doi.org/10.1148/radiol.2021211013
Article PubMed Google Scholar
Nguyen DHM, Patrick JD (2014) Supervised machine learning and active learning in classification of radiology reports. J Am Med Inform Assoc 21:893–901. https://doi.org/10.1136/amiajnl-2013-002516
Article PubMed PubMed Central Google Scholar
Fink MA, Kades K, Bischoff A et al (2022) Deep learning–based assessment of oncologic outcomes from natural language processing of structured radiology reports. Radiol Artif Intell 4:e220055. https://doi.org/10.1148/ryai.220055
Article PubMed PubMed Central Google Scholar
Fink MA, Bischoff A, Fink CA et al (2023) Potential of ChatGPT and GPT-4 for data mining of free-text CT reports on lung cancer. Radiology 308:e231362. https://doi.org/10.1148/radiol.231362
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. http://arxiv.org/abs/1301.3781. Accessed 16 Jul 2023
Vosshenrich J, Nesic I, Boll DT, Heye T (2023) Investigating the impact of structured reporting on the linguistic standardization of radiology reports through natural language processing over a 10-year period. Eur Radiol. https://doi.org/10.1007/s00330-023-10050-2
Article PubMed Google Scholar

Download references

Acknowledgements

Matthias A. Fink is a fellow of the Physician Scientist Program, Faculty of Medicine, University of Heidelberg.

Funding

Open Access funding enabled and organized by Projekt DEAL. The author states that this work has not received any funding.

Author information

Authors and Affiliations

Clinic for Diagnostic and Interventional Radiology, University Hospital Heidelberg, Heidelberg, Germany
Matthias A. Fink
Translational Lung Research Center Heidelberg, Member of the German Center for Lung Research, Heidelberg, Germany
Matthias A. Fink

Authors

Matthias A. Fink
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias A. Fink.

Ethics declarations

Guarantor

The scientific guarantor of this publication is Matthias A. Fink.

Conflict of interest

The author of this manuscript declares no relationships with any companies, whose products or services may be related to the subject matter of the article.

Statistics and biometry

No complex statistical methods were necessary for this paper.

Informed consent

Written informed consent was not required for this study.

Ethical approval

Institutional Review Board approval was not required.

Study subjects or cohorts overlap

Not applicable.

Methodology

• Commentary

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This comment refers to the article available at https://doi.org/10.1007/s00330-023-10050-2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Fink, M.A. From data to insights: how natural language processing and structured reporting advance data-driven radiology. Eur Radiol 33, 7494–7495 (2023). https://doi.org/10.1007/s00330-023-10242-w

Download citation

Received: 30 August 2023
Revised: 30 August 2023
Accepted: 05 September 2023
Published: 02 October 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00330-023-10242-w

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

From data to insights: how natural language processing and structured reporting advance data-driven radiology

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Guarantor

Conflict of interest

Statistics and biometry

Informed consent

Ethical approval

Study subjects or cohorts overlap

Methodology

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation