An Evaluation of Convolutional Neural Nets for Medical Image Anatomy Classification

Khan, Sameer Ahmad; Yong, Suet-Peng

doi:10.1007/978-3-319-32213-1_26

An Evaluation of Convolutional Neural Nets for Medical Image Anatomy Classification

Sameer Ahmad Khan⁶ &
Suet-Peng Yong⁶

Conference paper
First Online: 19 June 2016

1456 Accesses
4 Citations

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 387))

Abstract

Classification of the anatomical structures is an important precondition for several computer aided detection and diagnosis systems. Attaining extraordinary precision for automatic classification is a stimulating job because of vast amount of variation in the anatomical structures. Current trend in object recognition is driven by “Deep learning” methods that are outperforming the contemporary methods in classification of images. Till now these “Deep learning” methods have been applied on natural images. In this study, we compare the performance of three main Deep learning architectures i.e. LeNet, AlexNet, GoogLeNet on medical imaging data containing five anatomical structures for anatomic specific classification.

Download conference paper PDF

1 Introduction

Classification of medical images is considered to be an important component of computer aided detection and diagnosis systems [1]. Automatic localization or identification is very useful in initializing organ specific processing such as detecting liver tumors [2]. It is a challenging task to achieve high accuracies for automated classification of anatomy, because of the variability’s in the anatomical structures due to varying contrast, deformed shapes due to pathologies and occlusion. In image classification problems the descriptiveness and discriminative power of features extracted are important to achieve good classification results. The feature extraction techniques that have been used in medical imaging commonly include filter based features [3] and the very popular scale invariant feature transform (SIFT) [4].

Neural Networks (NN) has been studied for many years to solve complex classification problems including image classification. The distinct advantage of neural network is that the algorithm could be generalized to solve different kinds of problems using similar designs. Convolutional Neural Network (CNN) is a successful example of attempts to model mammal visual cortex using NN. The reason for using convolutional neural nets (CNNs) for anatomy specific classification is that these CNNs outperformed the contemporary methods in natural image classification [5]. Also CNNs have made substantial advancements in biomedical applications [6]. In addition to this recent work has shown how the implementation of CNNs can significantly improve the performance of the state-of-the-art computer aided detection systems (CADe) [7–9]. In this study we are evaluating the comparative performance of three milestones in the development of Convolutional Neural Networks for anatomy specific classification, i.e. LeNet [10], AlexNet [5] and GoogLeNet [11].

2 Related Work

2.1 Convolutional Neural Nets

Convolutional Neural Networks (CNNs) are a special kind of deep prototypes that are in charge for numerous exhilarating recent results in computer vision. Initially proposed in the 1980’s by K. Fukushima and after that developed by Y. LeCun and teammates as LeNet [10], CNNs picked up acclaim through the accomplishment of LeNet on the challenging task of handwritten digit recognition in 1989 It took a few decades for CNNs to create another leap forward in computer vision, commencing with AlexNet [5] in 2012, which won the overall ImageNet challenge.

In a CNN, the key calculation is the convolution of a feature detector with an input signal. Convolution with a pool of filters, like the learned filters in Fig. 1, augments the representation at the first layer of a CNN, the components go from individual pixels to straightforward primitives like even and vertical lines, circles, and fixes of shading. Rather than ordinary single-channel picture processing filters, these CNN filters are processed over all of the input channels. Convolutional filters are translation-invariant so they yield a high reaction wherever a feature element is identified.

2.1.1 LeNet

LeNet [10] comprises of five layers that contains trainable parameters as shown in Fig. 2. The input is a 28 × 28 pixel image.

Layer 1 represents convolutional layer that contains 20 feature maps with kernel size of 5, which depicts that each unit in each feature map is connected to 5 × 5 neighborhood in the input. Conv1 contains 1520 learned parameters. Layer 2 i.e. pool1 is a pooling layer that aggregates the learned parameters to make the invariant to the transformations. Pool1 represents a layer with 20 feature maps of size 12 × 12. Layer 3 is again a convolutional layer conv2 that produces 25,050 learned parameters by convolving the pooled feature maps. Layer 4 i.e. pool2 aggregates the convolved features from layer 3 i.e. conv2. After convolutions and pooling, in layer 5, i.e. ip1, an inner product operation trailed by rectified linear unit activation (ReLU) function is applied, that resulted in 400,500 learned parameters. After this in layer 6 i.e. ip2 an inner product operation is again applied, that resulted in a reduced set of learned parameters, i.e. 2505. So a total of 429,575 parameters are learned which are then passed to a softmax classifier to determine the loss from the actual output.

2.1.2 AlexNet

AlexNet [5] proposed by Alex Krizhhevsky as shown in Fig. 3 is a convolutional neural net that revolutionized the image classification task by beating the state of the art image classification methods in 2012.

AlexNet comprises of 11 layers. i.e. conv1 added with relu1 and norm1, with kernel size 11 and stride of 4, which means after every four pixels perform the convolution. Which produces some learned parameters. The first layer i.e. conv1 layer is followed by pooling i.e. pool1 as explained above for the LeNet. The kernel size for the pooling is set to 3 with stride 2. Pool1 is followed by convolution conv2 with kernel size 5 and stride 2. On conv2 parameters relu2 is applied, that is followed by norm2. The conv2 parameters are again pooled in pool2 layers by applying maxpooling with kernel size 3 and stride 2. The pooled feature maps are again convolved in layer conv3, with parameter setting of kernel size equal to 3, stride of 1 and padding of 1. These convolved features are again convolved in layer conv4 with parameter setting same as in layer conv3, followed by relu4. The features from layer conv4 are again convolved in layer conv5, with the same parameter setting as in layer conv4 followed by relu5. The features from layer conv5 are pooled in layer pool5. Which is followed by fully connected layers, i.e. fc6, fc7 and fc8. In the layer fc6 two operations are applied, i.e. relu6 and drop6. Dropout operation prevents the deep nets from over fitting. The layer fc6 is followed by fc7, which is accompanied with relu7 and drop7. The features are finally fully connected through layer fc8 to the softmax classifier that determines the loss from the actual output.

2.1.3 GoogLeNet

GoogleNet [11] is a deep learning framework in which authors proposed an inception architecture that is based on how an optimal local sparse structure in a convolutional vision network can be approximated and covered by available components [11]. The architecture is based on the Hebbian principle, which states that neurons that fire together-wire together. According to this architecture and Hebbian principle, in images correlation tend to be local cover very local clusters by 1 × 1 convolutions. After that cover more spread out clusters by 3 × 3 convolutions as illustrated in Fig. 4.

After 3 × 3 convolution, the cluster that are more spread out cover those with 5 × 5 convolution, that will result in a heterogeneous set of convolutions. GoogLeNet comprises of 9 inception modules.

3 Experimental Evaluation of LeNet, AlexNet and GoogLeNet for Anatomy Specific Classification

We started our experimentation with the data set acquired from the U.S. National Library of medicine, national Institutes of Health, Department of Health and Human Services. This is an open access medical image database that contains thousands of anonymous medical imaging data, ranging from various modalities like CT, MRI, PET, XRAY etc. this database also contain images with various pathologies. For our experimental evaluation we downloaded 5500 images of various anatomies. The anatomies we considered for our experimentation are lung, liver, heart, kidney and lumbar spine. We downloaded the normal and pathological images, so that these frameworks should be generalized to classify any image of the same organ if it varies in shape or contrast. We supplied 1000 images per category for the training purpose, out of which 25 % were used validation. For the testing purpose we used the different test set also acquired from the same database. The test set contains 66 images of different anatomies as mentioned above. We used 3851 images for training and 1149 for validation.

3.1 Experimental Evaluation of LeNet

We started our experimentation with LeNet. Before training the net we resized the images to the size of 28 × 28 and preprocessed them by subtracting the mean image from each pixel. After that we trained the LeNet with the batch size of 50. Which means 50 images were supplied at a time for each epoch for training and we used stochastic gradient descent as a training algorithm with a learning rate of 0.01. The training of the LeNet is shown in Fig. 5, which depicts how the accuracy and training loss goes with each iteration.

This figure gives us the accuracy of 45 % on the validation data, whereas the training loss decreases and validation loss is greater than the validation accuracy with each iteration depicting that the model is over fitting. After that we tried to see how this network performs on the unknown data i.e. the test data. The test data is evaluated on AlexNet and the top nine predictions to classify the data into respective classes is shown in Figs. 8 and 9. The summarized results of training and validation is shown in Table 1.

Table 1 Comparative results of LeNet, AlexNet and GoogLeNet

Full size table

3.2 Experimental Evaluation of AlexNet and GoogLeNet

The parameter setting for AlexNet is different from LeNet. The image dimensions for AlexNet are set as 256 × 256. The images are mean subtracted also and network is trained with the same training algorithm i.e. stochastic gradient descent. The batch size for AlexNet is 50 while as the default batch size is 100. But because of the limiting capability of our machine we choose the 50 batch size and same setting has been adopted for the GoogLeNet. The training of the AlexNet and GoogLeNet with each iteration is shown in Figs. 6 and 7 respectively. It is evident from the figures that GoogLeNet does not perform well on the medical imaging data, whereas AlexNet has much higher validation accuracy then LeNet and GoogLeNet. But its training error increases with each epoch but still performs better than other two CNNs in terms of validation accuracy (Figs. 8 and 9).

4 Conclusion

In this study we compared three state-of-the-art convolutional neural networks for anatomy specific classification. We experimented with five different anatomies. It is evident from the results that CNN with the AlexNet architecture performs quite good then other two architectures. While as one of the good outcomes of this study is that it gave an insight into an important factor i.e. increasing the number of layers in case of GoogLeNet does not always increase the performance. So in order to get the better accuracies an optimization with solution to over fitting is needed in the future to train these nets to perform better on medical image data.

References

Roth HR, Lee CT, Shin H-C, Seff A, Kim L, Yao J et al (2015) Anatomy-specific classification of medical images using deep convolutional nets. arXiv:1504.04003
Criminisi A, Shotton J, Robertson D, Konukoglu E (2011) Regression forests for efficient anatomy detection and localization in CT studies. In: Medical computer vision. Recognition techniques and applications in medical imaging. Springer, pp 106–117
Google Scholar
Song Y, Cai W, Zhou Y, Feng DD (2013) Feature-based image patch approximation for lung tissue classification. IEEE Trans Med Imaging 797–808
Google Scholar
Zhang F, Song Y, Cai W, Lee M-Z, Zhou Y, Huang H et al (2014) Lung nodule classification with multilevel patch-based context analysis. IEEE Trans Biomed Eng 1155–1166
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Google Scholar
Cireşan DC, Giusti A, Gambardella LM, Schmidhuber J (2013) Mitosis detection in breast cancer histology images with deep neural networks. In: Medical image computing and computer-assisted intervention–MICCAI 2013. Springer, pp 411–418
Google Scholar
Prasoon A, Petersen K, Igel C, Lauze F, Dam E, Nielsen M (2013) Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In: Medical image computing and computer-assisted intervention–MICCAI 2013. Springer, pp 246–253
Google Scholar
Roth HR, Lu L, Seff A, Cherry KM, Hoffman J, Wang S et al (2014) A new 2.5 D representation for lymph node detection using random sets of deep convolutional neural network observations. In: Medical image computing and computer-assisted intervention–MICCAI 2014. Springer, pp 520–527
Google Scholar
Li Q, Cai W, Wang X, Zhou Y, Feng DD, Chen M (2014) Medical image classification with convolutional neural network. In: 13th international conference on control automation robotics & vision (ICARCV), pp 844–848
Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324
Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D et al (2014) Going deeper with convolutions. arXiv:1409.4842

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Universiti Teknologi Petronas, Seri Iskandar, Malaysia
Sameer Ahmad Khan & Suet-Peng Yong

Authors

Sameer Ahmad Khan
View author publications
You can also search for this author in PubMed Google Scholar
Suet-Peng Yong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sameer Ahmad Khan .

Editor information

Editors and Affiliations

University Teknikal Malaysia Melaka, Durian Tunggal, Melaka, Malaysia
Ping Jack Soh
Singapore Campus, #05-01 SIT Building, Newcastle University, Singapore, Singapore
Wai Lok Woo
Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
Hamzah Asyrani Sulaiman
Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
Mohd Azlishah Othman
University Teknikal Malaysia Melaka, Durian Tunggal, Melaka, Malaysia
Mohd Shakir Saat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khan, S.A., Yong, SP. (2016). An Evaluation of Convolutional Neural Nets for Medical Image Anatomy Classification. In: Soh, P., Woo, W., Sulaiman, H., Othman, M., Saat, M. (eds) Advances in Machine Learning and Signal Processing. Lecture Notes in Electrical Engineering, vol 387. Springer, Cham. https://doi.org/10.1007/978-3-319-32213-1_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-32213-1_26
Published: 19 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32212-4
Online ISBN: 978-3-319-32213-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics