Holistic Brain Tumor Screening and Classification Based on DenseNet and Recurrent Neural Network

Zhou, Yufan; Li, Zheshuo; Zhu, Hong; Chen, Changyou; Gao, Mingchen; Xu, Kai; Xu, Jinhui

doi:10.1007/978-3-030-11723-8_21

Holistic Brain Tumor Screening and Classification Based on DenseNet and Recurrent Neural Network

Yufan Zhou¹⁸,
Zheshuo Li¹⁸,
Hong Zhu¹⁹,
Changyou Chen¹⁸,
Mingchen Gao¹⁸,
Kai Xu²⁰ &
…
Jinhui Xu¹⁸

Conference paper
First Online: 26 January 2019

3093 Accesses
38 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11383))

Abstract

We present a holistic brain tumor screening and classification method for detecting and distinguishing multiple types of brain tumors on MR images. The challenges arise from the significant variations of location, shape, size, and contrast of these tumors. The proposed algorithms start with feature extraction from axial slices using dense convolutional neural networks; the obtained sequential features of multiple frames are then fed into a recurrent neural network for classification. Different from most other brain tumor classification algorithms, our framework is free from manual or automatic region of interests segmentation. The results reported on a public dataset and a population of 422 proprietary MRI scans diagnosed as normal, gliomas, meningiomas and metastatic brain tumors demonstrate the effectiveness and efficiency of our method.

Y. Zhou, Z. Li and H. Zhu are equally contributed co-first authors.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Brain tumor is one of the most fatal cancers. In the United States, an estimated 700,000 people are living with primary brain and central nervous system tumors. Nearly 80,000 new cases of primary brain tumors are diagnosed yearly, and approximately one-third are malignant [1]. Many different types of brain tumors exist. The most prevalent brain tumor types in adults are gliomas and meningiomas.

Medical imaging plays a central role in diagnosing brain tumors. There are many imaging modalities that can provide information about brain tissue non-invasively, such as Magnetic Resonance Images (MRI), Computed Tomography (CT) and Positron Emission Tomography (PET). MRI has particularly been used frequently in brain tumor detection and identification, due to its high contrast of soft tissue, high spatial resolution and free of radiation. Despite these facts, brain tumor diagnosis still remains a challenging task. Its detection heavily relies on the experience of radiologists, and diagnosing a large amount of data can be quite time-consuming and sometimes non-reproducible.

Computer-Aided Diagnosis (CAD) can provide tremendous help in brain tumor diagnosis, prognosis and surgery. A typical brain tumor CAD system consists of three main phases, tumor region of interest (ROI) segmentation, feature extraction, and classification (based on the extracted features) [4,5,6]. Brain tumor segmentation, either manual or automatic, is perhaps the most important and time-consuming phase of such a system. A great deal of effort has been devoted to this problem, e.g., releasing publicly available benchmark datasets and organizing challenges [10]. Many algorithms have been proposed to solve the brain tumor segmentation problem, such as Deep Neural Networks [7] and SVM with Conditional Random Field [3]. Classifications based on SVM and/or ANN are then followed to distinguish different types of brain tumors based on the extracted features from ROIs. An obvious limitation of such frameworks is the need of tracing ROIs, which can cause a few problems. Firstly, since brain tumors can vary dramatically in their shapes, sizes, and locations, tracing ROIs could be quite challenging and often not fully automatic. This may cause significant errors to the segmentation, and be accumulated into the following phases, thus leading to inaccurate classification. Secondly, the tumor-surrounding tissues are suggested to be discriminative between different tumor categories [5]. Thirdly, relying solely on the features of ROIs means complete ignorance of the location information of the tumors, which can affect the classification considerably.

The aforementioned problems motivate us to propose an alternative approach for brain tumor screening and classification, eliminating the segmentation phase completely. Particularly, we propose to use the holistic 3D images directly without detailed annotation at the pixel or slice levels. Our approach models the 3D holistic images as sequences of 2D slices. It first adopts an auto-encoder, based on a deep DenseNet, to extract features of each 2D image. This allows us to avoid using the original noisy and high dimensional data. After features of 2D slices extracted, it is natural to apply a Recurrent Neural Network (RNN), specifically the Long Short Term Memory (LSTM) model to handle the sequential data for the classification. We also apply a purely convolutional model for sequential data, by stacking 2D slices features together to be treated as another image data. This is inspired by a recent work of using purely convolutional auto-encoder for sequence representation learning [12].

Our contributions in this work are three-fold:

The proposed models only need holistic label of patients other than pixel-wise/slice-wise labeling. Holistic labels are much easier to obtain in clinical routine.
We have collected a dataset of 422 MRI scans, containing normal control images as well as three types of brain tumors (i.e., meningioma, glioma, and metastasis tumor).^{Footnote 1}
Our deep neural network implements a novel architecture, treating 3D data as sequences of 2D slices, and using RNN or CNN to learn sequence-to-label mapping, with a DenseNet based auto-encoder for feature extraction. Two proposed models DenseNet-LSTM and DenseNet-DenseNet are demonstrated with two experiments tumor screening and tumor type classification using both public and proprietary datasets.

2 Preliminaries

2.1 Brain-Tumor Image Representations

Brain tumors are usually diagnosed with MRI or CT images, where patient i is represented by a sequence of 2-D images, denoting as ${\mathbf{X}}_i = \{{\varvec{x}}_1^{(i)}, \cdots , {\varvec{x}}_T^{(i)} \}$ with ${\varvec{x}}_t^{(i)} \in \mathbb {R}^{\ell _1\times \ell _2}$ being the t-th frame image. Different from existing label-exhaustive datasets where each 2-D image is associated with a label, in our dataset, each sequence of images ${\mathbf{X}}_i$ is associated with a single label $y_i\in \{0, 1, \cdots , P\}$, where P is the number of tumor types. As a result, our dataset is represented as $\mathcal {D} \triangleq \{({\mathbf{X}}_i, y_i)\}_{i=1}^N$ with N being the total number of image sequences (including patients and normal people). Figure 1 illustrates an example sequence of MRI images from a Glioma patient in our proprietary dataset. Note that there are only a few frames showing the existence of Glioma.

2.2 DenseNet

DenseNet [9] is a recently proposed special type of convolutional neural networks, where the current layer is connected by all its previous layers. The structure has some advantages over existing structures such as alleviating the vanishing-gradient problem, strengthening feature propagation, encouraging feature reuse, and reducing the number of parameters. A deep DenseNet is defined as a set of DenseNets (called dense blocks) connected sequentially, with additional convolutional and pooling operations between consecutive dense blocks. By such a construction, we can build a deep neural network flexible enough to represent complicated transformations. An example of the deep DenseNet is illustrated in Fig. 2.

2.3 Recurrent Neural Network (RNN)

RNN is a powerful framework to model sequence-to-sequence data. In our brain tumor application, the input sequence corresponds to features of the MRI images, which are extracted with a DenseNet described above; the output sequence degenerates to a single label, indicating whether the input sequence is diagnosed as tumor or not. Specifically, consider an input sequence ${\mathbf{X}}= \{{\varvec{x}}_1, \cdots , {\varvec{x}}_T \}$, where ${\varvec{x}}_t$ is the input data vector at time t. The corresponding hidden state vector ${\varvec{h}}_t$ at each time t is recursively calculated by applying a transition function ${\varvec{h}}_t = \mathcal {H}({\varvec{h}}_{t-1}, {\varvec{x}}_t)$ (specified below). Finally, the output y is calculated by mapping the final state ${\varvec{h}}_T$ to the label space. Figure 3 illustrates the RNN structure in our setting.

Long Short-Term Memory (LSTM). Vanilla RNN defines $\mathcal {H}$ as a linear transformation followed by an activation function. This simple structure is unable to model long-term dependency from the input, as is the case in our application. Instead, we adopt the more powerful LSTM transition function by introducing a memory cell that is able to preserve the state over long periods [8]. Specifically, each LSTM unit contains a cell ${\varvec{c}}_t$ at time t, which can be viewed as a memory unit. Reading or writing the cell is controlled through sigmoid gates: input gate ${\varvec{i}}_t$, forget gate ${\varvec{f}}_t$, and output gate ${\varvec{o}}_t$. Consequently, the hidden units ${\varvec{h}}_t$ are updated as:

$$\begin{aligned} {\varvec{i}}_t = \sigma ({\mathbf{W}}_{i}{\varvec{x}}_t + {\mathbf{U}}_{i}{\varvec{h}}_{t-1} + {\varvec{b}}_i),&\qquad {\varvec{f}}_t = \sigma ({\mathbf{W}}_{f}{\varvec{x}}_t + {\mathbf{U}}_{f}{\varvec{h}}_{t-1} + {\varvec{b}}_f), \\ {\varvec{o}}_t = \sigma ({\mathbf{W}}_{o}{\varvec{x}}_t + {\mathbf{U}}_{o}{\varvec{h}}_{t-1} + {\varvec{b}}_o),&\qquad \tilde{{\varvec{c}}}_t = \tanh ({\mathbf{W}}_{c}{\varvec{x}}_t + {\mathbf{U}}_{c}{\varvec{h}}_{t-1} + {\varvec{b}}_c), \\ {\varvec{c}}_t = {\varvec{f}}_t \odot {\varvec{c}}_{t-1} + {\varvec{i}}_t \odot \tilde{{\varvec{c}}}_t,&\qquad {\varvec{h}}_t = {\varvec{o}}_t \odot \tanh ({\varvec{c}}_t) \end{aligned}$$

where $\sigma (\cdot )$ denotes the logistic sigmoid function, and $\odot $ represents the element-wise matrix multiplication operator. ${\mathbf{W}}_{\{i,f,o,c\}}$, ${\mathbf{U}}_{\{i,f,o,c\}}$ and ${\varvec{b}}_{i,f,o,c}$ are the weights of the LSTM to be learned. Having obtained the hidden unit for the last time step T, we map ${\varvec{h}}_T$ to y by simply using a linear transformation followed by a softmax-layer, i.e., $p(y = k|{\varvec{h}}_T) = \textsf {Softmax}_k({\mathbf{W}}_{y}{\varvec{h}}_T + {\varvec{b}}_y)$, where $\textsf {Softmax}_k({\varvec{a}}) \triangleq \frac{\exp ({\varvec{a}}_k)}{\sum _i\exp ({\varvec{a}}_i)}$, and ${\mathbf{W}}_{y}$ and ${\varvec{b}}_y$ are the parameters to be learned.

3 Labeling-Free Brain-Tumor Classification

We describe our model based on the above building blocks. Different from existing methods for tumor classification using a standard alone CNN, we propose two models to predict image sequences directly, completely eliminating the time consuming procedure of labeling each frame independently, thus free of labeling.

3.1 DenseNet-LSTM Model

There are mainly two challenges in our task: (i) Directly using CNN to tackle image sequences is inappropriate as CNN is originally designed for static data. Fortunately, LSTM provides us a natural way to deal with sequence data. As a result, we adopt LSTM for image-sequence classification. (ii) Directly feeding original image sequences to an RNN works poorly because the original images are usually noisy and high-dimensional.

To alleviate this problem, we propose an auto-encoder structure based on the deep DenseNet to extract features of the original images. The features from the auto-encoder are then fed to an RNN for classification. Specifically, in an auto-encoder, one trains an encoder and a deconder together, to reconstruct the output the same as input. To train the auto-encoder given brain-tumor images $({\varvec{x}}_t^{(i)})_{i, t}$, the objective is to minimize the reconstruction error: $\mathcal {F} = \sum _i\sum _t \left\| {\varvec{x}}_t^{(i)} - \text {DEC}\left( \text {ENC}({\varvec{x}}_t^{(i)})\right) \right\| ^2$, where $\Vert \cdot \Vert $ is the standard Frobenius norm; ENC$(\cdot )$ and DEC$(\cdot )$ denote the encoder and decoder implemented by two deep DenseNets, respectively. After training the auto-encoder, the extracted features for all the images are then used as the input data to train an RNN classifier for holistic brain-tumor classification. We adopt the standard cross-entropy loss function to train the RNN. The whole framework is illustrated in Fig. 4. We denote this model as DenseNet-LSTM.

3.2 DenseNet-DenseNet Model

An alternative way to RNN for sequence classification discovered recently is to replace the RNN with a CNN [12]. We stack the features of a tumor-sequence returned from the auto-encoder as a 2-D tensor, and treat it as input data to a second deep DenseNet for classification. In this way, the inter-frame correlations is translated into column-wise correlations in a single 2-D tensor, which can be effectively modeled by the convolutional operator in a DensetNet. We denote this model as DenseNet-DenseNet.

4 Experiments

We test our proposed framework on two datasets, one public dataset and one proprietary dataset (collected by our collaborators in their hospital). We have two experiments to evaluate the proposed models: Tumor screening and tumor type classification. Tumor screening is for testing the accuracy of our approach on deciding (or screening) whether a 2D sequence image contains a tumor. Tumor type classification is to classify tumors into multiple types.

Our implementation is based on TensorFlow. To alleviate overfitting, we adopt the weight-decay regularization and dropout in the training. The auto-encoder part needs to be trained only once. It takes around 5 h for 10,000 slices from 500 MRI sequences. The second part takes about half an hour for LSTM or one hour for DenseNet. The models were trained on a Nvidia Titan Xp GPU. For all the experiments, we randomly partition the dataset into a training dataset (72%), a test dataset (14%) and a validation dataset (14%). We repeat this process for six times and report the mean and variance of the accuracies. Figure 7 shows some examples of learning curves.

Public Dataset. The public dataset [5] includes 3064 (2D) slices of brain MRI from 233 patients, containing 708 meningiomas, 1426 gliomas, and 930 pituitary tumors. The tumors were manually delineated by experienced radiologists. Since our approach does not rely on segmentation, we utilize only the holistic label of each slice to indicate the tumor type. Since this dataset does not have the sequence images needed by our model, we convert each 2D image (slice) into a sequence of 20 slices by either duplicating it 19 times (for DenseNet-DenseNet) or adding 19 zero matrices (for DenseNet-LSTM). Our purpose of using this dataset is for both validating the robustness of the proposed framework and achieving the state-of-the-art performance, though our model is not designed for handling such 2D datasets.

Proprietary Dataset. We have collected a dataset of 422 MRI scans diagnosed as normal (75), glioma (150), meningiomas (67) and metastatic brain tumors (130). For each patient, T1, T2 and Flair MR images are available. Examples of the three tumor types are depicted in Fig. 5, which shows high variations of tumors in terms of locations, shapes and sizes.

Experimental Setup. In the DenseNet-based auto-encoder, for the encoder, it is a deep DenseNet with 4 dense blocks. In each block, there are 5 convolutional layers with kernel sizes of $3\times 3$ and $1\times 1$. We adopt the same configurations for the decoder. For other parameters of the DenseNet, we adopt the default setting as in [9]. The dimension of the latent space for RNN is set to 128.

Minibatch size is set to 32. We use a validation set to select the learning rates from $\{1e\text {-1}, 1e\text {-2}, 1e\text {-3}, 1e\text {-4}, 1e\text {-5}\}$; the dropout rates for the input-hidden layer and each convolutional layer in the DenseNet from $\{0, 0.05, 0.1, 0.15, 0.2\}$, and the weight-decay rates from $\{1e\text {-2}, 1e\text {-3}, 1e\text {-4}, 1e\text {-5}\}$.

Tumor Screening. The public dataset is not suitable for this task since it only contains images with tumors. We evaluated three models for tumor screening on the proprietary dataset: DenseNet-RNN (with vanilla RNN as a sequence classifier), DenseNet-LSTM and DenseNet-DenseNet. Their accuracies are $87.15\% \pm 3.79\%, 91.09\% \pm 3.62\% , 92.66\% \pm 2.73\%$ respectively. DenseNet-DenseNet presents the best performance for the proprietary dataset.

Tumor Type Classification. For the public dataset, DenseNet-LSTM outperforms all the previous work on this dataset. The baseline methods [5] reports an accuracy of 91.28% for its best model based on a complicated feature engineering and extra data information (from pixel-wise labeling). A recent model based on capsule networks [2] achieves 86.56% accuracy. Furthermore, our models are much more robust and practically useful because they are designed to handle 3D sequence images and is labeling free.

Our proprietary data is significantly more difficult to learn than the public one. Our DenseNet-LSTM is the best among different variations. DenseNet-LSTM is also tested on one versus one tumor type classification, resulting in three groups of experiments. Table 1 summarized the results. Figures 7 and 6 shows the learning curves of our models on proprietary and public dataset, respectively.

Table 1. Summary of experimental results on tumor type classification.

Full size table

Patient Embeddings with DenseNet and LSTM Features: To illustrate how our proposed framework achieves high discrimination ability, we embed the features from the DenseNet auto-encoder and the LSTM classifier onto a 2-D space, respectively. Note that the features from the auto-encoder do not consider the label information; thus the patients are not expected to be separable from the normal people. Figure 8 illustrates the corresponding feature embeddings using tSNE [11]. We can see that while patients are not separable in the auto-encoder-feature space, they are highly separable in the feature space learned by LSTM.

5 Conclusion

In this paper, we presented an alternative approach for screening and classifying the brain tumors using holistic 3D MR images. Our approach is capable of utilizing 3D sequence images and does not need the pixel-wise or slice-wise labeling. Experiments on public and proprietary datasets indicate that our approach is effective and highly efficient. As future work, we plan to (1) expand our proprietary dataset for more types of brain tumors, and (2) provide model interpretability based on weakly-supervised pathology localization.

Notes

1.
The anonymized proprietary dataset will be shared publicly with labels later on.

References

Brian tumor statistics. http://www.abta.org/about-us/news/brain-tumor-statistics/. Accessed 16 Feb 2018
Afshar, P., Mohammadi, A., Plataniotis, K.N.: Brain tumor type classification via capsule networks. arXiv preprint arXiv:1802.10200 (2018)
Bauer, S., Nolte, L.-P., Reyes, M.: Fully automatic segmentation of brain tumor images using support vector machine classification in combination with hierarchical conditional random field regularization. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6893, pp. 354–361. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23626-6_44
Chapter Google Scholar
Bauer, S., Wiest, R., Nolte, L.-P., Reyes, M.: A survey of MRI-based medical image analysis for brain tumor studies. Phys. Med. Biol. 58(13), R97 (2013)
Article Google Scholar
Cheng, J., et al.: Enhanced performance of brain tumor classification via tumor region augmentation and partition. PloS One 10(10), e0140381 (2015)
Article Google Scholar
El-Dahshan, E.-S.A., Mohsen, H.M., Revett, K., Salem, A.-B.M.: Computer-aided diagnosis of human brain tumor through MRI: a survey and a new algorithm. Expert Syst. Appl. 41(11), 5526–5545 (2014)
Article Google Scholar
Havaei, M., et al.: Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. IEEE (2017)
Google Scholar
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)
Article Google Scholar
van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
MATH Google Scholar
Zhang, Y., Shen, D., Wang, G., Gan, Z., Henao, R., Carin, L.: Deconvolutional paragraph representation learning. In: Advances in Neural Information Processing Systems, pp. 4169–4179 (2017)
Google Scholar

Download references

Acknowledgement

This work was supported in part by NSF through grants IIS-1422591, CCF-1422324, and CCF-1716400 and by NSFC through grants 81771904 and 61828205. It was also supported in part by start-up funds (for Drs. Mingchen Gao and Changyou Chen) from the Department of Computer Science and Engineering, University at Buffalo, the State University of New York.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, USA
Yufan Zhou, Zheshuo Li, Changyou Chen, Mingchen Gao & Jinhui Xu
School of Medical Information, Xuzhou Medical University, Xuzhou, China
Hong Zhu
Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
Kai Xu

Authors

Yufan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zheshuo Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Changyou Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mingchen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Kai Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jinhui Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kai Xu or Jinhui Xu .

Editor information

Editors and Affiliations

University Hospital of Zurich, Zürich, Switzerland
Alessandro Crimi
University of Pennsylvania, Philadelphia, PA, USA
Spyridon Bakas
University Medical Center Utrecht, Utrecht, The Netherlands
Hugo Kuijf
National Cancer Institute, Bethesda, MD, USA
Farahani Keyvan
University of Bern, Bern, Switzerland
Mauricio Reyes
Erasmus University Medical Center, Rotterdam, The Netherlands
Theo van Walsum

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y. et al. (2019). Holistic Brain Tumor Screening and Classification Based on DenseNet and Recurrent Neural Network. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2018. Lecture Notes in Computer Science(), vol 11383. Springer, Cham. https://doi.org/10.1007/978-3-030-11723-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-11723-8_21
Published: 26 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11722-1
Online ISBN: 978-3-030-11723-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics