Longitudinal Detection of Radiological Abnormalities with Time-Modulated LSTM

Santeramo, Ruggiero; Withey, Samuel; Montana, Giovanni

doi:10.1007/978-3-030-00889-5_37

Ruggiero Santeramo^36,37,
Samuel Withey³⁸ &
Giovanni Montana^36,37

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11045))

Included in the following conference series:

7601 Accesses
21 Citations
5 Altmetric

Abstract

Convolutional neural networks (CNNs) have been successfully employed in recent years for the detection of radiological abnormalities in medical images such as plain x-rays. To date, most studies use CNNs on individual examinations in isolation and discard previously available clinical information. In this study we set out to explore whether Long-Short-Term-Memory networks (LSTMs) can be used to improve classification performance when modelling the entire sequence of radiographs that may be available for a given patient, including their reports. A limitation of traditional LSTMs, though, is that they implicitly assume equally-spaced observations, whereas the radiological exams are event-based, and therefore irregularly sampled. Using both a simulated dataset and a large-scale chest x-ray dataset, we demonstrate that a simple modification of the LSTM architecture, which explicitly takes into account the time lag between consecutive observations, can boost classification performance. Our empirical results demonstrate improved detection of commonly reported abnormalities on chest x-rays such as cardiomegaly, consolidation, pleural effusion and hiatus hernia.

You have full access to this open access chapter, Download conference paper PDF

Chest X-Ray Analysis and Tuberculosis Detection Using CNN

Convolutional Neural Network to Detect Thorax Diseases from Multi-view Chest X-Rays

Deep Learning to Jointly Analyze Images and Clinical Data for Disease Detection

Keywords

1 Introduction

Deep learning approaches have exhibited impressive performance in medical imaging applications in recent years [2, 7, 19]. For instance, convolutional neural networks (CNNs) have had some success in detecting and classifying radiological abnormalities on chest x-rays, a particularly complex task [2, 12, 15, 21]. The majority of these studies have been designed for cross-sectional analyses, viewing a single image in isolation, and discard the fact that a patient may have had previous medical imaging examinations for which the radiological reports are also available. It is standard practice for radiologists to take clinical history into account to add context to their report by using comparison to previous imaging. Some abnormalities will be long-standing, but others may change over time, with varying clinical relevance. Often in elderly patients or those with a history of smoking, the baseline x-ray appearances, i.e. when that patient is “well”, can still be abnormal. If individual films are viewed in isolation, it can be challenging to tell with certainty if there are acute findings. If previous imaging is available, it is possible to determine if there has been interval change, for example, acute consolidation (indicating infection). As with humans, it is expected that a neural network can learn from previous patient-specific information, in this case all prior chest radiographs for that patient and their corresponding reports.

The motivation for this work is to assess the potential of recurrent neural networks (RNNs) for the real-time detection of radiological abnormalities when modelling the entire series of past exams that are available for any given patient. In particular, we set out to explore the performance of Long Short-Term Memory (LSTM) networks [8, 10], which have lately become the method of choice in sequential modelling, especially when used in combination with CNNs for visual feature extraction [6, 20]. The technical challenge faced in our context is that sequential medical exams are event-based observations. As such, they are collected at times of clinical need, i.e. they are not equally spaced, and the number of historical exams available for each patient can vary greatly. Figure 1 shows four longitudinal chest x-rays acquired on the same patient over a certain period of time. This figure also illustrates other challenges faced when modelling this type of longitudinal data: the images may be aquired using different x-ray devices (resulting in different image quality, i.e. resolution, brightness, etc.), there may be differences in patient positioning (i.e. supine, erect, rotated, degree of inspiration), differences in projection (postero-anterior and antero-posterior), and not all images are equally centred (i.e. there can be rotations, translations, etc.).

As LSTMs are typically applied on regularly-sampled data [9, 16, 17], they are ill-suited to work with irregular time gaps between consecutive observations, as previously noted [3, 13]. This is a particularly important limitation in our context as certain radiological abnormalities tend to be observed for longer periods of time whereas others are short-lived. In this article we demonstrate that an architecture combining a CNN with a simple modification of the standard LSTM is able to handle irregularly-sampled data and learn the temporal dynamics of certain visual features resulting in improved pattern detection. Using both simulated and real x-ray datasets, we demonstrate that this capability yields improved image classification performance over an LSTM baseline.

2 Motivating Dataset and Problem Formulation

The dataset used in this study was collected from the historical archives of the PACS (Picture Archiving and Communication System) at Guy’s and St. Thomas’ NHS Foundation Trust, in London, during the period from January 2005 to March 2016. The dataset has been previously used for the detection of lung nodules [14] and for multi-label metric learning [1]. It consists of $745\,480$ chest radiographs representative of an adult population and acquired using 40 different x-ray systems. Each associated radiological report was parsed using a natural language processing system for the automated extraction of radiological labels [5, 14]. For this study, we extracted a subset of $80\,737$ patients having a history of at least two exams, which resulted in $337\,575$ images (with $232\,610$ used for training and $104\,965$ for testing). Each image was scaled to a standard format of $299 \times 299$ pixels. The resulting dataset has an average of 4.18 examinations per patient with an average of 180.29 days between consecutive exams per patient.

In what follows, each individual sequence of longitudinal chest x-rays along with its associated vector of radiological labels is denoted as $\{X_{i}^t, l_{i}^t\}$, where $i=1,\ldots ,N$ is the patient index and $t=1, \ldots ,T_i$ is the time index. Typical chest x-ray datasets are characterised by relatively few examinations per patient (e.g. $T_i$ is around 4–5) and highly-irregular sampling rates. Our task is to predict the vector of image labels $l_{i}^{T_i}$ given the entire history of exams up to time $T_i-1$ plus the current image, i.e. $X_i^{T_i}$.

3 Time-Modulated LSTM

LSTMs are a particular type of RNNs able to classify, process and predict time series [8, 10]. The internal state of an LSTM (a.k.a. the cell state or memory) gives the architecture its ability to ’remember’. A standard LSTM contains memory blocks, and blocks contain memory cells. A typical memory block is made of three main components: an input gate controlling the flow of input activations into the memory cell, an output gate controlling the output flow of cell activations, and a forget gate for scaling the internal state of the cell. The forget gate modulates how much information is used from the internal state of the previous time-step. However, standard LSTMs are ill-suited for our task where the time between consecutive exams is variable, because they have no mechanism for explicitly modelling the arrival time of each observation. In fact, it has been shown that LSTMs, and more generally RNNs, underperform with irregularly sampled data or time series with missing values [4, 13]. Previous attempts to adapt LSTMs for use with irregularly sampled datapoints have mostly focused on speeding up the converge of the algorithm in settings with high-resolution sampled data [13] or to discount short-term memory [3].

To address these issues, we introduce two simple modifications of the standard LSTM architecture, called time-modulated LSTM (tLSTM), both making explicit use of the time indexes associated to the inputs. In the proposed architecture, all the images for a given patient are initially processed by a CNN architecture, which extracts a set of imaging features, denoted by $\widehat{X}_i^t$, at each time step. The LSTM takes as inputs $l_i^{t-1}$, i.e. the radiological labels describing the images acquired at the previous time-step, the current image features, $\widehat{X}_i^t$, and the time lapse between $X_i^{t-1}$ and $X_i^{t}$, which we denote as $\delta _i^t$. For the last image in the sequence, the LSTM predicts the image labels, $l_i^t$, called $y_i^t$. Figure 2 provides a high-level overview of this model and the equations below define the tLSTM unit:

$$\begin{aligned} \begin{aligned} f_t&= \sigma (W_{fl}*l^{t-1} + W_{fx}*\widehat{X}^t + W_{fj}*\delta ^t + b_f) ,\\ i_t&= \sigma (W_{il}*l^{t-1} + W_{ix}*\widehat{X}^t + W_{ij}*\delta ^t + b_i) ,\\ o_t&= \sigma (W_{ol}*l^{t-1} + W_{ox}*\widehat{X}^t + W_{oj}*\delta ^t + b_o) ,\\ c_t&= \tanh (W_{cl}*l^{t-1} + W_{cx}*\widehat{X}^t + W_{cj}*\delta ^t + b_c) ,\\ h_t&= f_t * h_{t-1} + i_t * c_t ,\\ y^t&= o_t * \tanh (h_t) \end{aligned} \end{aligned}$$

(1)

Here, $h_t$ defines the internal state at time-step t, while $f_t$, $i_t$ and $o_t$ refer to the forget, input and output gates at time-step t, respectively. These are all computed as linear combinations of the vectors $l^{t-1}, \widehat{X^t}$ and the scalar $\delta ^t$, and then transformed by a sigmoid function, $\sigma (\cdot )$. The matrices denoted by W contain learnable weights indexed by two letters (e.g. $W_{fl}$ contains the weights of the forget gate f for labels l, and so on). At time $t = 1$, we initialise $l_i^{t-1} = <0\dots 0>$ (an array of zeros) and $\delta _i^t=0$. The time lapses, $\delta _i^t$, linearly modulate the information inside the internal cell state as well as the output, forget and input gates.

A different variation of the previous model (tLSTMv2) uses the time lapse only to modulate the internal state, $h_t$. In this case, each $\delta _i^t$ actively contributes to updating $h_t$ directly and, implicitly, to estimating the label vector $y^t$, i.e.

$$\begin{aligned} \begin{aligned} h_t&= f_t * h_{t-1} + i_t * c_t + W_{tj}*\delta ^t\\ y^t&= o_t * \tanh (h_t) . \end{aligned} \end{aligned}$$

(2)

The form of the other updating equations, i.e. $f_g, i_t, o_t$ and $c_t$, is similar to those in Eq. (1), without the $Ws \times \delta ^t$ elements.

4 Simulated Data

In order to better assess the potential advantages introduced by the time-modulated LSTM in settings where observations are event-driven and the underlying patterns to be detected are time-varying, we generated simulated data as an alternative to the real chest x-ray dataset of Sect. 2. Simulating images enables us to precisely control the sampling frequency at which the relevant visual patterns appear and disappear over time as well as the signal to noise ratio. For this study, we simulated a population of image sequences of varying lengths. Within a sequence, each image consisted of a noisy background image containing one or more randomly placed digits drawn from the set $\{0, 3, 6, 8, 9\}$. We simulated three kinds of patterns inspired by the radiological patterns seen in real medical images: (i) rare patterns consisting of digits appearing with low probability; (ii) common patterns consisting of rapidly appearing and resolving digits; (iii) persistent labels, consisting of digits observed for extended periods of time. In analogy to medical images, each digit in our simulation represents a radiological abnormality to be detected, hence multiple (and possibly overlapping) digits are allowed to coexist within an image. The time lapse $\delta ^t$ was modelled as a uniform random variable taking value in the interval [1, 10]. An example of simulated images can be found in the Supplementary Material.

5 Experimental Results

In our experiments with the real x-ray dataset, the CNN component in our architecture conists of a pre-trained Inception v3 [18] without the classification layer. The imaging features $\hat{X}_i^t$ (an array 2048 elements) from the CNN are as used as inputs for the LSTM component along with the image labels. We considered four possible radiological labels: cardiomegaly, consolidation, pleural effusion and hiatus hernia. The performance of the time-modulated LSTM models is assessed by the PPV (Positive Predictive Value) and NPV (Negative Predictive Value) along with F-score, i.e the harmonic mean of precision and recall.

Table 1. Results on real data$^{*}$

Full size table

We compared the performance of four models: the baseline CNN classifier (Inceptionv3) that only uses each current image to predict the labels, but does not exploit the historical exams for a given patient, and three variations of the architecture illustrated in Fig. 2: one using the standard LSTM and the two versions of time-modulated LSTM model introduced in Sect. 3. Both tLSTM versions introduced noticeable performance improvements; see Table 1. In particular, tLSTMv1 yields an increase of $\sim $7% in F-measure over the baseline and $\sim $8% over a standard LSTM. Moreover, tLSTMv1 achieves a $\sim $9% improvement in PPV over the baseline. Overall, tLSTM achieves improved performance over the standard LSTM due to its ability to handle irregularly sampled data.

For the simulated dataset, we used a pre-trained AlexNet [11] as feature extractor in combination with three versions of the LSTM for modelling sequences of images. A full table with results can be found in the Supplementary Material. We purposely introduced a sufficiently high level of noise in the visual patterns so as to make the classification problem with individual images particularly difficult; accordingly, the single-image classifier did not achieve acceptable classification results. Likewise, the architecture using a standard LSTM did not introduce significant improvements due to the irregularly sampled observations. On the other hand, larger classification improvements were achieved using the time-modulated LSTM units as those were able to decode the sequential patterns by explicitly taking into account the time gaps between consecutive observations.

6 Conclusions

Our experimental results suggest that the modified LSTM architectures, combined with CNNs, are suitable for modelling sequences of event-based imaging observations. By explicitly modelling the individual time lapses between consecutive events, these architectures are able to better capture the evolution of visual patterns over time, which has a boosting effect on the classification performance. The full potential of these models is best demonstrated using simulated datasets whereby we have control over the exact nature of the temporal patterns and the image labels are perfectly known. In real radiological datasets, there are often errors in some of the image labels due to typographical errors, interpretive errors, ambiguous language and, in some cases, long-standing findings not being mentioned. This can cause problems both in CNN training and testing. Despite these challenges, we have demonstrated that improved classification results can also be achieved by the time-modulated LSTM components on a large chest x-ray dataset. Thus we empirically proved that a patient’s imaging history can be used to improve automated radiological reporting. In future work, we plan more extensive testing of a system trained end-to-end on a much larger number of radiological classes. The code with the networks used for our experiment can be found online: https://github.com/WMGDataScience/tLSTM.

References

Annarumma, M., Montana, G.: Deep metric learning for multi-labelled radiographs. In: 33rd Annual ACM SAC 2018, pp. 34–37. ACM (2018)
Google Scholar
Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., Greenspan, H.: Chest pathology detection using deep learning with non-medical training. In: 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), 294–297, 07 2015
Google Scholar
Baytas, I.M., Xiao, C., Zhang, X., Wang, F., Jain, A.K., Zhou, J.: Patient subtyping via time-aware LSTM networks. In: 23rd ACM SIGKDD (2017)
Google Scholar
Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for multivariate time series with missing values. Scientific reports (2018)
Google Scholar
Cornegruta, S., Bakewell, R., Withey, S., Montana, G.: Modelling radiological language with bidirectional long short-term memory networks. In: 7th Workshop on Health Text Mining and Information Analysis (2016)
Google Scholar
Donahue, J., et al.: Long-term Recurrent Convolutional Networks for Visual Recognition and Description. ArXiv e-prints, November 2014
Google Scholar
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115 (2017)
Article Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (1999)
Article Google Scholar
Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. CoRR (2013)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc (2012)
Google Scholar
Litjens, G., et al.: A Survey on Deep Learning in Medical Image Analysis. ArXiv e-prints, February 2017
Google Scholar
Neil, D., Pfeiffer, M., Liu, S.-C.: Phased LSTM: accelerating recurrent network training for long or event-based sequences. ArXiv e-prints, October 2016
Google Scholar
Pesce, E., Ypsilantis, P.-P., Withey, S., Bakewell, R., Goh, V., Montana, G.: Learning to detect chest radiographs containing lung nodules using visual attention networks. ArXiv e-prints, December 2017
Google Scholar
Rajpurkar, P., et al.: CheXNet: radiologist-level pneumonia detection on Chest X-rays with deep learning. ArXiv e-prints, November 2017
Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D., Wong, W., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 802–810. Curran Associates Inc (2015)
Google Scholar
Srivastava, N., Mansimov, E., Salakhutdinov R.: Unsupervised learning of video representations using LSTMs. CoRR, abs/1502.04681 (2015)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Gulshan, V., Peng, L., Coram, M., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22), 2402–2410 (2016)
Article Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164. IEEE (2015)
Google Scholar
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. ArXiv e-prints, May 2017
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biomedical Engineering, King’s College London, London, UK
Ruggiero Santeramo & Giovanni Montana
WMG, University of Warwick, Coventry, UK
Ruggiero Santeramo & Giovanni Montana
Department of Radiology, Guy’s and St Thomas’ NHS Foundation Trust, London, UK
Samuel Withey

Authors

Ruggiero Santeramo
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Withey
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Montana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Montana .

Editor information

Editors and Affiliations

University College London, London, UK
Danail Stoyanov
University of Leeds, Leeds, UK
Zeike Taylor
University of Adelaide, Adelaide, SA, Australia
Gustavo Carneiro
IBM Research – Almaden, San Jose, CA, USA
Tanveer Syeda-Mahmood
Sunnybrook Health Science Centre, Toronto, ON, Canada
Anne Martel
Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
Lena Maier-Hein
University of Porto, Porto, Portugal
João Manuel R.S. Tavares
Queensland University of Technology, Brisbane, QLD, Australia
Andrew Bradley
Universidade Estadual Paulista, Bauru, São Paulo, Brazil
João Paulo Papa
OSRAM (Germany), Garching b. München, Germany
Vasileios Belagiannis
University of Lisbon, Lisboa, Portugal
Jacinto C. Nascimento
ReFUEL4, Singapore, Singapore
Zhi Lu
German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
Sailesh Conjeti
IBM Research – Almaden, San Jose, CA, USA
Mehdi Moradi
Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Case Western Reserve University, Cleveland, OH, USA
Anant Madabhushi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 232 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santeramo, R., Withey, S., Montana, G. (2018). Longitudinal Detection of Radiological Abnormalities with Time-Modulated LSTM. In: Stoyanov, D., et al. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA ML-CDS 2018 2018. Lecture Notes in Computer Science(), vol 11045. Springer, Cham. https://doi.org/10.1007/978-3-030-00889-5_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-00889-5_37
Published: 20 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00888-8
Online ISBN: 978-3-030-00889-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Longitudinal Detection of Radiological Abnormalities with Time-Modulated LSTM

Abstract

Similar content being viewed by others

Chest X-Ray Analysis and Tuberculosis Detection Using CNN

Convolutional Neural Network to Detect Thorax Diseases from Multi-view Chest X-Rays

Deep Learning to Jointly Analyze Images and Clinical Data for Disease Detection

Keywords

1 Introduction

2 Motivating Dataset and Problem Formulation

3 Time-Modulated LSTM

4 Simulated Data

5 Experimental Results

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 232 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Longitudinal Detection of Radiological Abnormalities with Time-Modulated LSTM

Abstract

Similar content being viewed by others

Chest X-Ray Analysis and Tuberculosis Detection Using CNN

Convolutional Neural Network to Detect Thorax Diseases from Multi-view Chest X-Rays

Deep Learning to Jointly Analyze Images and Clinical Data for Disease Detection

Keywords

1 Introduction

2 Motivating Dataset and Problem Formulation

3 Time-Modulated LSTM

4 Simulated Data

5 Experimental Results

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 232 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation