Convolutional neural networks: an overview and application in radiology
- 26k Downloads
Convolutional neural network (CNN), a class of artificial neural networks that has become dominant in various computer vision tasks, is attracting interest across a variety of domains, including radiology. CNN is designed to automatically and adaptively learn spatial hierarchies of features through backpropagation by using multiple building blocks, such as convolution layers, pooling layers, and fully connected layers. This review article offers a perspective on the basic concepts of CNN and its application to various radiological tasks, and discusses its challenges and future directions in the field of radiology. Two challenges in applying CNN to radiological tasks, small dataset and overfitting, will also be covered in this article, as well as techniques to minimize them. Being familiar with the concepts and advantages, as well as limitations, of CNN is essential to leverage its potential in diagnostic radiology, with the goal of augmenting the performance of radiologists and improving patient care.
• Convolutional neural network is a class of deep learning methods which has become dominant in various computer vision tasks and is attracting interest across a variety of domains, including radiology.
• Convolutional neural network is composed of multiple building blocks, such as convolution layers, pooling layers, and fully connected layers, and is designed to automatically and adaptively learn spatial hierarchies of features through a backpropagation algorithm.
• Familiarity with the concepts and advantages, as well as limitations, of convolutional neural network is essential to leverage its potential to improve radiologist performance and, eventually, patient care.
KeywordsMachine learning Deep learning Convolutional neural network Medical imaging Radiology
Class activation map
Convolutional neural network
Generative adversarial network
Graphical processing unit
The Institute of Electrical and Electronics Engineers
ImageNet Large Scale Visual Recognition Competition
IEEE International Symposium on Biomedical Imaging
Lung Image Database Consortium and Image Database Resource Initiative
Magnetic resonance imaging
Positron emission tomography
Rectified linear unit
Red, green, and blue
Stochastic gradient descent
A tremendous interest in deep learning has emerged in recent years . The most established algorithm among various deep learning models is convolutional neural network (CNN), a class of artificial neural networks that has been a dominant method in computer vision tasks since the astonishing results were shared on the object recognition competition known as the ImageNet Large Scale Visual Recognition Competition (ILSVRC) in 2012 [2, 3]. Medical research is no exception, as CNN has achieved expert-level performances in various fields. Gulshan et al. , Esteva et al. , and Ehteshami Bejnordi et al.  demonstrated the potential of deep learning for diabetic retinopathy screening, skin lesion classification, and lymph node metastasis detection, respectively. Needless to say, there has been a surge of interest in the potential of CNN among radiology researchers, and several studies have already been published in areas such as lesion detection , classification , segmentation , image reconstruction [10, 11], and natural language processing . Familiarity with this state-of-the-art methodology would help not only researchers who apply CNN to their tasks in radiology and medical imaging, but also clinical radiologists, as deep learning may influence their practice in the near future. This article focuses on the basic concepts of CNN and their application to various radiology tasks, and discusses its challenges and future directions. Other deep learning models, such as recurrent neural networks for sequence models, are beyond the scope of this article.
The following terms are consistently employed throughout this article so as to avoid confusion. A “parameter” in this article stands for a variable that is automatically learned during the training process. A “hyperparameter” refers to a variable that needs to be set before the training process starts. A “kernel” refers to the sets of learnable parameters applied in convolution operations. A “weight” is generally used interchangeably with “parameter”; however, we tried to employ this term when referring to a parameter outside of convolution layers, i.e., a kernel, for example in fully connected layers.
What is CNN: the big picture (Fig. 1)
How is CNN different from other methods employed in radiomics?
Most recent radiomics studies use hand-crafted feature extraction techniques, such as texture analysis, followed by conventional machine learning classifiers, such as random forests and support vector machines [15, 16]. There are several differences to note between such methods and CNN. First, CNN does not require hand-crafted feature extraction. Second, CNN architectures do not necessarily require segmentation of tumors or organs by human experts. Third, CNN is far more data hungry because of its millions of learnable parameters to estimate, and, thus, is more computationally expensive, resulting in requiring graphical processing units (GPUs) for model training.
Building blocks of CNN architecture
The CNN architecture includes several building blocks, such as convolution layers, pooling layers, and fully connected layers. A typical architecture consists of repetitions of a stack of several convolution layers and a pooling layer, followed by one or more fully connected layers. The step where input data are transformed into output through these layers is called forward propagation (Fig. 1). Although convolution and pooling operations described in this section are for 2D-CNN, similar operations can also be performed for three-dimensional (3D)-CNN.
A convolution layer is a fundamental component of the CNN architecture that performs feature extraction, which typically consists of a combination of linear and nonlinear operations, i.e., convolution operation and activation function.
The distance between two successive kernel positions is called a stride, which also defines the convolution operation. The common choice of a stride is 1; however, a stride larger than 1 is sometimes used in order to achieve downsampling of the feature maps. An alternative technique to perform downsampling is a pooling operation, as described below.
The key feature of a convolution operation is weight sharing: kernels are shared across all the image positions. Weight sharing creates the following characteristics of convolution operations: (1) letting the local feature patterns extracted by kernels translation b invariant as kernels travel across all the image positions and detect learned local patterns, (2) learning spatial hierarchies of feature patterns by downsampling in conjunction with a pooling operation, resulting in capturing an increasingly larger field of view, and (3) increasing model efficiency by reducing the number of parameters to learn in comparison with fully connected neural networks.
A list of parameters and hyperparameters in a convolutional neural network (CNN)
Kernel size, number of kernels, stride, padding, activation function
Pooling method, filter size, stride, padding
Fully connected layer
Number of weights, activation function
Model architecture, optimizer, learning rate, loss function, mini-batch size, epochs, regularization, weight initialization, dataset splitting
Nonlinear activation function
A pooling layer provides a typical downsampling operation which reduces the in-plane dimensionality of the feature maps in order to introduce a translation invariance to small shifts and distortions, and decrease the number of subsequent learnable parameters. It is of note that there is no learnable parameter in any of the pooling layers, whereas filter size, stride, and padding are hyperparameters in pooling operations, similar to convolution operations.
Global average pooling
Another pooling operation worth noting is a global average pooling . A global average pooling performs an extreme type of downsampling, where a feature map with size of height × width is downsampled into a 1 × 1 array by simply taking the average of all the elements in each feature map, whereas the depth of feature maps is retained. This operation is typically applied only once before the fully connected layers. The advantages of applying global average pooling are as follows: (1) reduces the number of learnable parameters and (2) enables the CNN to accept inputs of variable size.
Fully connected layer
The output feature maps of the final convolution or pooling layer is typically flattened, i.e., transformed into a one-dimensional (1D) array of numbers (or vector), and connected to one or more fully connected layers, also known as dense layers, in which every input is connected to every output by a learnable weight. Once the features extracted by the convolution layers and downsampled by the pooling layers are created, they are mapped by a subset of fully connected layers to the final outputs of the network, such as the probabilities for each class in classification tasks. The final fully connected layer typically has the same number of output nodes as the number of classes. Each fully connected layer is followed by a nonlinear function, such as ReLU, as described above.
Last layer activation function
A list of commonly applied last layer activation functions for various tasks
Last layer activation function
Multiclass single-class classification
Multiclass multiclass classification
Regression to continuous values
Training a network
Training a network is a process of finding kernels in convolution layers and weights in fully connected layers which minimize differences between output predictions and given ground truth labels on a training dataset. Backpropagation algorithm is the method commonly used for training neural networks where loss function and gradient descent optimization algorithm play essential roles. A model performance under particular kernels and weights is calculated by a loss function through forward propagation on a training dataset, and learnable parameters, namely kernels and weights, are updated according to the loss value through an optimization algorithm called backpropagation and gradient descent, among others (Fig. 1).
A loss function, also referred to as a cost function, measures the compatibility between output predictions of the network through forward propagation and given ground truth labels. Commonly used loss function for multiclass classification is cross entropy, whereas mean squared error is typically applied to regression to continuous values. A type of loss function is one of the hyperparameters and needs to be determined according to the given tasks.
Data and ground truth labels
Data and ground truth labels are the most important components in research applying deep learning or other machine learning methods. As a famous proverb originating in computer science notes: “Garbage in, garbage out.” Careful collection of data and ground truth labels with which to train and test a model is mandatory for a successful deep learning project, but obtaining high-quality labeled data can be costly and time-consuming. While there may be multiple medical image datasets open to the public [24, 25], special attention should be paid in these cases to the quality of the ground truth labels.
Separate validation and test sets are needed because training a model always involves fine-tuning its hyperparameters and performing model selection. As this process is performed based on the performance on the validation set, some information about this validation set leaks into the model itself, i.e., overfitting to the validation set, even though the model is never directly trained on it for the learnable parameters. For that reason, it is guaranteed that the model with fine-tuned hyperparameters on the validation set will perform well on this same validation set. Therefore, a completely unseen dataset, i.e., a separate test set, is necessary for the appropriate evaluation of the model performance, as what we care about is the model performance on never-before-seen data, i.e., generalizability.
It is worthy of mention that the term “validation” is used differently in the medical field and the machine learning field . As described above, in machine learning, the term “validation” usually refers to a step to fine-tune and select models during the training process. On the other hand, in medicine, “validation” usually stands for the process of verifying the performance of a prediction model, which is analogous to the term “test” in machine learning. In order to avoid this confusion, the word “development set” is sometimes used as a substitute for “validation set”.
A list of common methods to mitigate overfitting
How to mitigate overfitting
More training data
Regularization (weight decay, dropout)
Reduce architecture complexity
Training on a small dataset
An abundance of well-labeled data in medical imaging is desirable but rarely available due to the cost and necessary workload of radiology experts. There are a couple of techniques available to train a model efficiently on a smaller dataset: data augmentation and transfer learning. As data augmentation was briefly covered in the previous section, this section focuses on transfer learning.
A fixed feature extraction method is a process to remove fully connected layers from a network pretrained on ImageNet and while maintaining the remaining network, which consists of a series of convolution and pooling layers, referred to as the convolutional base, as a fixed feature extractor. In this scenario, any machine learning classifier, such as random forests and support vector machines, as well as the usual fully connected layers in CNNs, can be added on top of the fixed feature extractor, resulting in training limited to the added classifier on a given dataset of interest. This approach is not common in deep learning research on medical images because of the dissimilarity between ImageNet and given medical images.
A fine-tuning method, which is more often applied to radiology research, is to not only replace fully connected layers of the pretrained model with a new set of fully connected layers to retrain on a given dataset, but to fine-tune all or part of the kernels in the pretrained convolutional base by means of backpropagation. All the layers in the convolutional base can be fine-tuned or, alternatively, some earlier layers can be fixed while fine-tuning the rest of the deeper layers. This is motivated by the observation that the early-layer features appear more generic, including features such as edges applicable to a variety of datasets and tasks, whereas later features progressively become more specific to a particular dataset or task [34, 35].
One drawback of transfer learning is its constraints on input dimensions. The input image has to be 2D with three channels relevant to RGB because the ImageNet dataset consists of 2D color images that have three channels (RGB: red, green, and blue), whereas medical grayscale images have only one channel (levels of gray). On the other hand, the height and width of an input image can be arbitrary, but not too small, by adding a global pooling layer between the convolutional base and added fully connected layers.
There has also been increasing interest in taking advantage of unlabeled data, i.e., semi-supervised learning, to overcome a small-data problem. Examples of this attempt include pseudo-label  and incorporating generative models, such as generative adversarial networks (GANs) . However, whether these techniques can really help improve the performance of deep learning in radiology is not clear and remains an area of active investigation.
Applications in radiology
This section introduces recent applications within radiology, which are divided into the following categories: classification, segmentation, detection, and others.
Because 2D images are frequently utilized in computer vision, deep learning networks developed for the 2D images (2D-CNN) are not directly applied to 3D images obtained in radiology [thin-slice CT or 3D-magnetic resonance imaging (MRI) images]. To apply deep learning to 3D radiological images, different approaches such as custom architectures are used. For example, Setio et al.  used a multistream CNN to classify nodule candidates of chest CT images between nodules or non-nodules in the databases of the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) , ANODE09 , and the Danish Lung Cancer Screening Trial . They extracted differently oriented 2D image patches based on multiplanar reconstruction from one nodule candidate (one or nine patches per candidate), and these patches were used in separate streams and merged in the fully connected layers to obtain the final classification output. One previous study used 3D-CNN for fully capturing the spatial 3D context information of lung nodules . Their 3D-CNN performed binary classification (benign or malignant nodules) and ternary classification (benign lung nodule, and malignant primary and secondary lung cancers) using the LIDC-IDRI database. They used a multiview strategy in 3D-CNN, whose inputs were obtained by cropping three 3D patches of a lung nodule in different sizes and then resizing them into the same size. They also used the 3D Inception model in their 3D-CNN, where the network path was divided into multiple branches with different convolution and pooling operators.
Time series data are frequently obtained in radiological examinations such as dynamic contrast-enhanced CT/MRI or dynamic radio isotope (RI)/positron emission tomography (PET). One previous study used CT image sets of liver masses over three phases (non-enhanced CT, and enhanced CT in arterial and delayed phases) for the classification of liver masses with 2D-CNN . To utilize time series data, the study used triphasic CT images as 2D images with three channels, which corresponds to the RGB color channels in computer vision, for 2D-CNN. The study showed that 2D-CNN using triphasic CT images was superior to that using biphasic or monophasic CT images.
One way to perform segmentation is to use a CNN classifier for calculating the probability of an organ or anatomical structure. In this approach, the segmentation process is divided into two steps; the first step is construction of the probability map of the organ or anatomical structure using CNN and image patches, and the second is a refinement step where the global context of images and the probability map are utilized. One previous study used a 3D-CNN classifier for segmentation of the liver on 3D CT images . The input of 3D-CNN were 3D image patches collected from entire 3D CT images, and the 3D-CNN calculated probabilities for the liver from the image patches. By calculating the probabilities of the liver being present for each image patch, a 3D probability map of the liver was obtained. Then, an algorithm called graph cut  was used for refinement of liver segmentation, based on the probability map of the liver. In this method, the local context of CT images was evaluated by 3D-CNN and the global context was evaluated by the graph cut algorithm.
Although segmentation based on image patch was successfully performed in deep learning, U-net of Ronneberger et al.  outperformed the image patch-based method on the ISBI [IEEE (The Institute of Electrical and Electronics Engineers) International Symposium on Biomedical Imaging] challenge for segmentation of neuronal structures in electron microscopic images. The architecture of U-net consists of a contracting path to capture anatomical context and a symmetric expanding path that enables precise localization. Although it was difficult to capture global context and local context at the same time by using the image patch-based method, U-net enabled the segmentation process to incorporate a multiscale spatial context. As a result, U-net could be trained end-to-end from a limited number of training data.
One potential approach of using U-net in radiology is to extend U-net for 3D radiological images, as shown in classification. For example, V-net was suggested as an extension of U-net for segmentation of the prostate on volumetric MRI images . In the study, V-net utilized a loss function based on the Dice coefficient between segmentation results and ground truth, which directly reflected the quality of prostate segmentation. Another study  utilized two types of 3D U-net for segmenting liver and liver mass on 3D CT images, which was named cascaded fully convolutional neural networks; one type of U-net was used for segmentation of the liver and the other type for the segmentation of liver mass using the liver segmentation results. Because the second type of 3D U-net focused on the segmentation of liver mass, the segmentation of liver mass was more efficiently performed than single 3D U-net.
A common task for radiologists is to detect abnormalities within medical images. Abnormalities can be rare and they must be detected among many normal cases. One previous study investigated the usefulness of 2D-CNN for detecting tuberculosis on chest radiographs . The study utilized two different types of 2D-CNN, AlexNet  and GoogLeNet , to detect pulmonary tuberculosis on chest radiographs. To develop the detection system and evaluate its performance, the dataset of 1007 chest radiographs was used. According to the results, the best area under the curve of receiver operating characteristic curves for detecting pulmonary tuberculosis from healthy cases was 0.99, which was obtained by ensemble of the AlexNet and GoogLeNet 2D-CNNs.
Nearly 40 million mammography examinations are performed in the USA every year. These examinations are mainly performed for screening programs aimed at detecting breast cancer at an early stage. A comparison between a CNN-based CADe system and a reference CADe system relying on hand-crafted imaging features was performed previously . Both systems were trained on a large dataset of around 45,000 images. The two systems shared the candidate detection system. The CNN-based CADe system classified the candidate based on its region of interest, and the reference CADe system classified it based on the hand-crafted imaging features obtained from the results of a traditional segmentation algorithm. The results show that the CNN-based CADe system outperformed the reference CADe system at low sensitivity and achieved comparable performance at high sensitivity.
One previous study  used U-net to solve the inverse problem in imaging for obtaining a noiseless CT image reconstructed from a subsampled sinogram (projection data). To train U-net for reconstructing a noiseless CT image from the subsampled sinogram, the training data of U-net consist of (i) noisy CT images obtained from subsampled sinogram by filtered backprojection (FBP) and (ii) noiseless CT images obtained from the original sinogram. The study suggested that, while it would be possible to train U-net for reconstructing CT images directly from the sinogram, performing the FBP first greatly simplified the training. As a refinement of the original U-net, the study added a skip connection between the input and output for residual learning. Their study showed that U-net could effectively produce noiseless CT images from the subsampled sinogram.
Although deep learning requires a large number of training data, building such a large-scale training data of radiological images is a challenging problem. One main challenge is the cost of annotation (labeling); the annotation cost for a radiological image is much larger than a general image because radiologist expertise is required for annotation. To tackle this problem, one previous study  utilized radiologists’ annotations which are routinely added to radiologists’ reports (such as circle, arrow, and square). The study obtained 33,688 bounding boxes of lesions from the annotation of radiologists’ reports. Then, unsupervised lesion categorization was performed to speculate labels of the lesions in the bounding box. To perform unsupervised categorization, the following three steps were iteratively performed: (i) feature extraction using pretrained VGG16 model  from the lesions in the bounding box, (ii) clustering of the features, and (iii) fine-tuning of VGG16 based on the results of the clustering. The study named the labels obtained from the results of clustering as pseudo-category labels. The study also suggested that the detection system was built using the Faster R-CNN method , the lesions in the bounding box, and their corresponding pseudo-category. The results demonstrate that detection accuracy could be significantly improved by incorporating pseudo-category labels.
Radiologists routinely produce their reports as results of interpreting medical images. Because they summarize the medical images as text data in the reports, it might be possible to collect useful information about disease diagnosis effectively by analyzing the radiologists’ reports. One previous study  evaluated the performance of a CNN model, compared with a traditional natural language processing model, in extracting pulmonary embolism findings from chest CT. By using word embedding, words in the radiological reports can be converted to meaningful vectors . For example, the following equation holds by using vector representation with word embedding: king – man + woman = queen. In the previous study, word embedding enabled the radiological reports to be converted to a matrix (or image) of size 300 × 300. By using this representation, 2D-CNN could be used to classify the reports as pulmonary embolism or not. Their results showed that the performance of the CNN model was equivalent to or beyond that of the traditional model.
Challenges and future directions
Although the recent advancements of deep learning have been astonishing, there still exist challenges to its application to medical imaging.
Although there are several methods that facilitate learning on smaller datasets as described above, well-annotated large medical datasets are still needed since most of the notable accomplishments of deep learning are typically based on very large amounts of data. Unfortunately, building such datasets in medicine is costly and demands an enormous workload by experts, and may also possess ethical and privacy issues. The goal of large medical datasets is the potential to enhance generalizability and minimize overfitting, as discussed previously. In addition, dedicated medical pretrained networks can probably be proposed once such datasets become available, which may foster deep learning research on medical imaging, though whether transfer learning with such networks improves the performance in the medical field compared to that with ImageNet pretrained models is not clear and remains an area of further investigation.
Convolutional neural networks (CNNs) have accomplished astonishing achievements across a variety of domains, including medical research, and an increasing interest has emerged in radiology. Although deep learning has become a dominant method in a variety of complex tasks such as image classification and object detection, it is not a panacea. Being familiar with key concepts and advantages of CNN as well as limitations of deep learning is essential in order to leverage it in radiology research with the goal of improving radiologist performance and, eventually, patient care.
We would like to acknowledge Yasuhisa Kurata, MD, PhD, Department of Diagnostic Imaging and Nuclear Medicine, Kyoto University Graduate School of Medicine. This study was partly supported by JSPS KAKENHI (grant number JP16K19883).
- 3.Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25. Available online at: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. Accessed 22 Jan 2018
- 9.Christ PF, Elshaer MEA, Ettlinger F et al (2016) Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. In: Ourselin S, Joskowicz L, Sabuncu M, Unal G, Wells W (eds) Proceedings of Medical image computing and computer-assisted intervention – MICCAI 2016. https://doi.org/10.1007/978-3-319-46723-8_48
- 17.Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning. Available online at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.165.6419&rep=rep1&type=pdf. Accessed 23 Jan 2018
- 18.Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv. Available online at: https://arxiv.org/pdf/1710.05941.pdf. Accessed 23 Jan 2018
- 19.Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, vol 15, pp 315–323Google Scholar
- 20.Lin M, Chen Q, Yan S (2013) Network in network. arXiv. Available online at: https://arxiv.org/pdf/1312.4400.pdf. Accessed 22 Jan 2018
- 22.Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv. Available online at: https://arxiv.org/pdf/1412.6980.pdf. Accessed 23 Jan 2018
- 23.Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv. Available online at: https://arxiv.org/pdf/1609.04747.pdf. Accessed 23 Jan 2018
- 25.Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM (2017) ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3462–3471. https://doi.org/10.1109/CVPR.2017.369
- 27.Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv. Available online at: https://arxiv.org/pdf/1207.0580.pdf. Accessed 22 Jan 2018
- 28.Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv. Available online at: https://arxiv.org/pdf/1502.03167.pdf. Accessed 22 Jan 2018
- 29.Zhong Z, Zheng L, Kang G, Li S, Yang Y (2017) Random erasing data augmentation. arXiv. Available online at: https://arxiv.org/pdf/1708.04896.pdf. Accessed 27 Jan 2018
- 30.Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv. Available online at: https://arxiv.org/pdf/1409.1556.pdf. Accessed 22 Jan 2018
- 31.He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2016.90
- 32.Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2015.7298594
- 33.Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2017.243
- 34.Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proceedings of Computer Vision – ECCV 2014, vol 8689, pp 818–833Google Scholar
- 35.Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? arXiv. Available online at: https://arxiv.org/pdf/1411.1792.pdf. Accessed 25 Jan 2018
- 36.Lee DH (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Proceedings of the ICML 2013 Workshop: Challenges in Representation Learning. Available online at: http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf. Accessed 23 Jan 2018
- 37.Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training GANs. arXiv. Available online at: https://arxiv.org/pdf/1606.03498.pdf. Accessed 23 Jan 2018
- 44.Lucchesi FR, Aredes ND (2016) Radiology data from The Cancer Genome Atlas Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (TCGA-CESC) collection. The Cancer Imaging Archive. https://doi.org/10.7937/K9/TCIA.2016.SQ4M8YP4
- 45.Kurata Y, Nishio M, Fujimoto K et al (2018) Automatic segmentation of uterus with malignant tumor on MRI using U-net. In: Proceedings of the Computer Assisted Radiology and Surgery (CARS) 2018 congress (accepted)Google Scholar
- 48.Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A (eds) Proceedings of Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. https://doi.org/10.1007/978-3-319-24574-4_28
- 49.Milletari F, Navab N, Ahmadi S-A (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV). https://doi.org/10.1109/3DV.2016.79
- 55.Yan K, Wang X, Lu L, Summers RM (2017) DeepLesion: automated deep mining, categorization and detection of significant radiology image findings using large-scale clinical lesion annotations. arXiv. Available online at: https://arxiv.org/pdf/1710.01766.pdf. Accessed 29 Jan 2018
- 57.Pennington J, Socher R, Manning CD (2014) GloVe: Global Vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543. https://doi.org/10.3115/v1/D14-1162
- 58.Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2016.319
- 59.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/ICCV.2017.74
- 60.Szegedy C, Zaremba W, Sutskever I et al (2014) Intriguing properties of neural networks. arXiv. Available online at: https://arxiv.org/pdf/1312.6199.pdf. Accessed 24 Jan 2018
- 61.Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv. Available online at: https://arxiv.org/pdf/1412.6572.pdf. Accessed 24 Jan 2018
- 62.Su J, Vargas DV, Sakurai K (2018) One pixel attack for fooling deep neural networks. arXiv. Available online at: https://arxiv.org/pdf/1710.08864.pdf. Accessed 24 Jan 2018
- 63.Brown TB, Mané D, Roy A, Abadi M, Gilmer J (2018) Adversarial patch. arXiv. Available online at: https://arxiv.org/pdf/1712.09665.pdf. Accessed 24 Jan 2018
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.