Survey on deep learning for pulmonary medical imaging

As a promising method in artificial intelligence, deep learning has been proven successful in several domains ranging from acoustics and images to natural language processing. With medical imaging becoming an important part of disease screening and diagnosis, deep learning-based approaches have emerged as powerful techniques in medical image areas. In this process, feature representations are learned directly and automatically from data, leading to remarkable breakthroughs in the medical field. Deep learning has been widely applied in medical imaging for improved image analysis. This paper reviews the major deep learning techniques in this time of rapid evolution and summarizes some of its key contributions and state-of-the-art outcomes. The topics include classification, detection, and segmentation tasks on medical image analysis with respect to pulmonary medical images, datasets, and benchmarks. A comprehensive overview of these methods implemented on various lung diseases consisting of pulmonary nodule diseases, pulmonary embolism, pneumonia, and interstitial lung disease is also provided. Lastly, the application of deep learning techniques to the medical image and an analysis of their future challenges and potential directions are discussed.


Introduction
Deep learning covers a set of artificial intelligence methods that use many interconnected units to fulfill complex tasks. Deep learning algorithms can automatically learn representations from large amounts of data rather than the use of a set of pre-programmed instructions [1][2][3]. Radiology is a natural application field for deep learning because it relies mainly on extracting useful information from images, and the research in this field has rapidly developed [4]. With the aggravation of air pollution and the increasing number of smokers, respiratory diseases have become a serious threat to people's life and health [5]. However, many of the early clinical manifestations of respiratory diseases are not evident, and some patients do not even feel any discomfort during the early stage. Hence, many patients miss the critical period of early treatment because they discover clinical symptoms at a later time. Therefore, public awareness of early detection and treatment is necessary for the prevention and treatment of lung diseases. In medical imaging, the accurate diagnosis and evaluation of diseases depend on image collection and interpretation. Computer-aided diagnosis (CAD) of medical images has been developed for the early discovery and analysis of patient's symptoms based on medical images. However, this method has long been rooted on limited features that are identified based on physicians' past experiences. Most researchers also study medical image features that need to be extracted manually and thus require special clinical experience and a deep understanding of the data [6,7]. With the rapid development of computer vision and medical imaging, computer-aided calculation can assist medical workers in diagnosis, such as enhancing diagnostic capabilities, identifying the necessary treatments, supporting their workflow, and reducing the workload to capture specific features from medical images.
The deep learning algorithm was first applied to medical image processing in 1993, in which the neural network was used for the detection of pulmonary nodules [8]. In 1995, deep learning was applied in breast tissue detection [9]. The region of interest (ROI) was extracted from mammograms through a model, which contains tumors and normal tissues confirmed via biopsy. The model results include one input layer, two hidden layers, and one output layer for back propagation. At that time, the typical convolutional neural network (CNN) image-processing architecture has not been widely accepted by scholars because it has high demand for computing power and sufficient data and does not provide good interpretations of results. Most doctors prefer clear interpretations, which are important in the medical field but cannot be provided by deep learning to support clinical decisions.
Most people focus on manual feature extraction on images, such as features that may represent the edge of a straight line (e.g., organ detection) or contour information (e.g., a circular object like a colon polyp) [10]. Alternatively, key points (feature points) on different scales are identified, and their directions are calculated [11,12]. Radiomics [13] is also used to study some higher-order features, such as the local and global shape and texture.
With the explosive growth of data and the emergence of graphics processing unit with increasing computing power, deep neural networks have made considerable progress. In general, medical image processing can be divided into image classification, detection, segmentation, registration, and other related tasks. For image classification, each image must be classified into specific categories. The algorithm determines related objects in the image, including the type of disease, pneumothorax, and bullae and generates a list of possible object categories in descending order of confidence [14][15][16][17]. The image object detection is a further improvement based on image classification. In this task, the algorithm gives the category of the detected object (e.g., pulmonary nodules appear in a CT slice) and marks the corresponding boundary of the object on the image [18][19][20]. Semantic segmentation is a basic task in computer vision in which visual input is divided into different semantically interpretable categories. All the pixels belonging to the lesion in the image might need to be distinguished [21,22].
Medical image processing mainly aims to detect possible lesions and tumors because of their remarkably effects on the follow-up diagnosis and treatment. Although tumor detection and analysis have been widely studied, many obstacles must be overcome for future application. In pulmonary nodules [23,24], challenges exist due to withinclass variability in lesion appearance. First, the shape, size, and density of the same lesion may vary, and the appearance of the lesion might be different (Fig. 1A, solid pulmonary nodules; Fig. 1B, ground glass pulmonary nodules, in green box). Second, many different diseases (normal tissues and pulmonary nodules) show the same texture features (Fig. 1C, pleural nodules, in green box, and many blood vessels in the center of the image). Third, the quality of image acquisition, such as the change of posture, blur, and occlusion, is considered (Fig. 1D). Fourth, different surrounding environments make different types of lung nodules appear to be diverse. Lastly, unbalanced data often pose a challenge in designing effective models from limited data.
Several dedicated surveys have been developed for the application of deep learning in medical image analysis [11,25]. These surveys include a large amount of related works and cover various references in almost all fields in medical imaging analysis by using deep learning. In the present study, we focused on a comprehensive review of pulmonary medical image analysis. A previous review [26] centered on the detection and classification of pulmonary nodules by using CNNs. However, additional works should be focused on targeting the segmentation tasks crossing diverse pulmonary diseases. The present review aimed to present the history of deep learning and its recent applications in medical imaging, especially in pulmonary diseases, and to summarize the specific tasks in these applications.
This paper is organized as follows. In section "Overview of deep learning," we present a brief introduction of deep learning techniques and the origins of deep learning in neuroscience. The specific application areas of deep learning are presented in section "Deep learning in pulmonary medical image." Detailed reviews of the datasets and the performance evaluation are presented in section "Datasets and performance." Finally, our discussion are provided in sections "Challenges and future trends" and "Conclusions."

Overview of deep learning
In this section, we introduce the origin of modern deep learning algorithms and several deep learning network structures commonly used in medical image processing. Then, we will briefly introduce some concepts used in the network (Table 1).

Historical perspective on networks
The history of neural networks can be traced back to the 1940s, and the deep neural network at that time was called cybernetics [27][28][29]. Simple bionic models such as perceptrons, which can train a single model, have emerged with the discovery of the biomedical theory [30]. In 1962, Hubel and Wiesel proposed the concept of the receptive field by studying of cat visual cortical cells and found that the animal's visual nervous system recognizes objects in layers [21,32]. A simple structure of visual processing was then established. When an object passes through a visual area, the brain constructs complex visual information through the edges. Although this model helps understand the functions of the brain, it was not designed to be an actual model. In 1971, basing on the receptive field study, Blakemore [33] proposed that receptive field of visual cells is acquired rather than innate.
In 1980, Japanese scholar Fukushima proposed a new cognitive machine (neocognitron) based on the concept of receptive field. It can be regarded as the first implementation network of CNN and the first application of the concept of receptive field in the field of artificial neural network [34]. The model is inspired by the animal visual system, namely, the hierarchical structure of vision proposed by Hubel. In 1984, Fukushima proposed an improved neocognitron machine with double C-layers [35]. By the 1980s, the method of connection mechanism was introduced [36]. Rumelhart proposed the famous back-propagating (BP) algorithm, which can train a neural network with two hidden layers and transmit errors in reverse for changing the weight of the network [37]. Some of the ideas of this algorithm have been applied in the structure of deep networks. The emergence of BP algorithm makes multi-layer perceptrons trainable and solves XOR problems [38] that cannot be explained by a single perceptron [39,40]. In 1989, Robert Hecht-Nielsen reported that a continuous function in any closed interval can be approximated using a three-layer network of hidden layers, indicating that the multilayer perceptron has the socalled "universal approximation" capability [41].

Modern architectures
These simple learning algorithms mentioned in the previous section have greatly accelerated the development of neural networks. In 1989, the emergence of CNNs made people pay attention to the generalization error of networks [43,44,49]. In 1990, the proposed convolutional network to identify numbers marked the great success of CNNs in the field of computer vision and image processing [44,50]. LeNet [44] was proposed by LeCun in 1998 to solve the problem of handwritten numeral recognition. Subsequently, the basic architecture of CNN has been defined; it comprises a convolution layer, a pooling layer and a full connection layer.
With the rapid development of computer technology, the computational bottleneck in the neural network has been continuously overcome, which promoted the rapid development of neural networks in the last two decades. Subsequently, the neural network continued to achieve impressive performance on certain tasks, but these network frameworks are still difficult to train [51,52]. Until 2006, the deep belief network proposed that the difficulty of training of neural network was gradually improved [45]. In this network, Hinton uses a greedy algorithm for layer-bylayer training, which effectively solves the above problems. After that, the same strategy was used in training many other types of deep networks [53].
Since 1980, the recognition and prediction capabilities of deep neural networks have improved, and their application fields have become more extensive. With the increasing amount of data, the scale of the neural network model has also expanded tremendously. Among these models, AlexNet can be considered as the landmark algorithm structure [46]; this model was proposed by Krizhevsky in ImageNet in 2012 [54]. Prior to AlexNet, only a single object in a very small image could be identified due to computational limitations. After 2012, the size of the image that the neural network can process has gradually increased, and an image of any size may be used as input for processing. In the same year, in the annual image classification competition ImageNet, AlexNet dropped the top five error rate with the highest accuracy from 26.1% to only 15.3%, which was 10% lower than the previous year's champion and far exceeded the second participating team. Subsequently, many scholars have re-examined deep neural networks. Deep neural convolution networks have often won in this competition. The VGG-Net [47] and GoogLe-Net [16] were proposed in 2014, and the 2015 champion algorithm ResNet [17] deepened more layers than the previous network of AlexNet. As the network deepens, it becomes a better network. Dense neural networks [48] have also achieved good results in other fields.

Deep learning in medical image analysis
The aforementioned networks are mainly used for common image classification tasks. An overview of network structures used for classification, detection, and segmentation tasks for the perspective of medical image analysis is described below.

Classification
The classification task determines which classes an image belongs to. Depending on the task, the class can exist in binary (e.g., normal and abnormal) or multiple categories (e.g., nodule classification in chest CT). Many medical studies use CNNs to analyze medical images to stage disease severity in different organs [55]. Some authors investigated on combination of local information and global contextual information and designed architecture for image analysis at different scales [56]. Some work focused on the utilization of 3D CNN for enhanced classification performance [57].

Detection
Object detection in medical image analysis refers to the localization of various ROIs such as lung nodules and is often an important pre-processing step for segmentation tasks in medical image analysis. The detection task in most medical image analysis requires the processing of 3D images. To utilize deep learning algorithm for 3D data, Yang et al. processed 3D MRI images by using 2D MRI sequences with typical CNN [58]. de Vos et al. localized 3D bounding box of organs by analyzing 2D sequences from 3D CT volume [59]. To reduce the complexity of 3D images, Zheng et al. decomposed 3D convolution as three 1D convolutions for the detection of carotid artery bifurcation [60].

Segmentation
Segmentation of meaningful parts, such as organ, substructure, and lesion, provides a reliable basis for subsequent medical image analysis (Fig. 2). U-net [61] published by Ronneberger et al. has become the most wellknown CNN architecture for medical image segmentation from very few images. U-net is based on fully convolutional networks (FCN) [62]. FCN can be utilized for classification at the pixel level and use deconvolution layer for upsampling the feature map of the last convolution layer and restore it to the same size of the input image, thus allowing the prediction to be generated for each pixel. The spatial information in the original input image is preserved. Subsequently, the pixel-by-pixel classification is performed on the upsampled feature map to complete the final image segmentation. In U-net, however, skip connections are used for connecting downsampling to upsampling layers, which makes features extracted by downsampling layers passed directly to the upsampling layers. This process allows U-net to analyze the full context of the image, resulting in segmentation map in an end-to-end way. After U-net was proposed, many researchers have used U-net structure for medical image segmentation and made improvements based on U-net.
Cicek et al. designed 3D U-net [63] targeting 3D image sequences, and Milletari et al. proposed a 3D-variant of Unet architecture, namely V-net [64], which uses Dice coefficient as loss function.

Deep learning in pulmonary medical image
In the analysis of the number of thoracic pulmonary nodules, the automatic texture extraction of the pulmonary nodules has always been a key issue in traditional algorithms. In the past decades, the manual extraction of the texture morphology of the pulmonary nodules has been the conventional way of designing algorithms.
This section presents an overview of the contribution of deep learning to various application areas in pulmonary medical imaging.

Pulmonary nodule
Lung cancer is one of the most severe cancer types [65]. This disease can be prevented if the pulmonary nodules are detected early and diagnosed correctly. With the help of modern CAD systems, radiologists can detect more pulmonary nodules with much less time [66][67][68][69]. Detection, segmentation, and classification of pulmonary nodules are the main functions of the modern CAD system and belong to computer vision, which has achieved huge advance with CNNs (Table 2).

Pulmonary nodule classification
For the classification of pulmonary nodules, most studies focus on how computer-aided detection system provides radiologists with image manifestations, such as type of the nodule (benign or malignant) for the early diagnosis of lung cancer and provides advice for the diagnosis. Fig. 3 provides an illustration of the commonly used classification network.
Owing to the self-learning and generalization ability of deep CNN, it has been applied in the classification of the type of pulmonary nodules. A specific nodule image network structure has been proposed to solve three types of nodule recognition problems, namely, solid, semi-solid, and ground glass opacity (GGO) [70]. Netto et al. [71] studied the separation of pulmonary nodule-like structures from other structures, such as blood vessels and bronchi. Finally, the structure is divided into nodules and nonnodules based on shape and texture measurement with support vector machine. Pei et al. [72] used a 2D    [73,74] developed methods to differentiate benign and malignant nodules in low-density CT scans. Causey et al. [75] proposed a method called NoduleX for the prediction of lung nodule malignancy with CT scans. Zhao et al. [76] proposed an agile CNN model to overcome the challenges of small-scale medical datasets and nodules. Considering the limited chest CT data, Xie et al. [77] used transfer learning algorithm to separate benign and malignant pulmonary nodules. Shen et al. [78] presented a multicrop CNN (MC-CNN) to automatically extract nodule salient information for the investigation of the lung nodule malignancy suspiciousness. Liu et al. [79,80] proposed a multi-task model to explore the relatedness between lung nodule classification and the attribute score. Many researchers have used 3D CNNs to predict the malignancy of the pulmonary nodule and achieve a high AUC score [81,82]. Some researchers attempted to make the prediction interpretable by using multitask joint learning [83,84].

Pulmonary nodule detection
The diagnosis of pulmonary nodules is a special detection task. Considering that one pulmonary nodule can go across multi CT slices, most of the existing pulmonary nodule detection methods are based on 3D or 2.5D CNNs (convolution neural networks). The general detection process, including training and testing phases, is illustrated in Fig. 4. A high-performance pulmonary nodule detection system must have high sensitivity and precision. Hence, many researchers have focused on two-stage networks. Two-stage involves one network for nodule candidate detection and the other for false positive reduction (Fig. 5). Ding et al. [85] proposed a deconvolutional structure for faster region-based CNN (faster R-CNN) for candidate detection with a 3D DCNN for false positive reduction. A 3D roto-translation group convolution (G-Convs) was introduced for false positive reduction network for improved efficiency and performance [86]. A 3D faster R-CNN with a U-net-like encoder-decoder structure for candidate detection and a gradient boosting machine with a 3D dual path network (DPN) for false positive reduction have been designed [87]. Tang et al. [88] used online hard negative mining in the first stage and assembled both stages via consensus until the predictions are realized. Tang et al. [89] then proposed an end-to-end method for training the candidate detection and false-positive reduction network together, resulting in improved performance. In pulmonary nodule detection, the imbalanced sample is a severe problem. Two-stage networks use the first stage for choosing positive and hard negative samples, thus providing the second stage with a balanced ratio between positive and negative samples. ResNet [17] and the feature pyramid network combined single stage model have been modified [90]. This model improved the sample imbalance via a patch-based sampling strategy. Another one-stage network based on SSD has been introduced [91]. It uses Fig. 4 Illustration of the pipeline for lesion detection. In both training and testing phases, the medical images in DICOM formats are converted and preprocessed to obtain the input images. In the training phase, region proposal methods are used for the extraction of the ROIs of the input images and then adopting classification on the ROIs and output prediction scores. In the testing phase, input images are fed into the trained model generated from the training phase to obtain inference results. group convolution and attention network for abstract features and balances the samples with hard negative sample mining. Liu et al. [94] evaluated the influence of radiation dose, patient age, and CT manufacturer on the performance of deep learning applied in nodule detection.

Pulmonary nodule segmentation
Pulmonary nodules are first detected. The segmentation of pulmonary nodules is also important in measuring the size of the nodule, and the malignancy prediction is the final target. The U-net architecture and unsupervised learning are widely adopted in the segmentation task. Considering that the segmentation label is difficult to obtain, a weaklysupervised method that generates accurate voxel-level nodule segmentation has been proposed [102]; this method only needs the image level classification label. Messay et al. [103] trained a nodule segmentation model by using weakly labeled data without dense voxel-level annotations.

Pulmonary embolism (PE)
PE is a highly lethal condition that occurs when an artery in the lung becomes partially or completely blocked. It occurs when a thrombus generated from legs, or sometimes other parts of the body, moves to the lungs and obstructs the central, lobar, segmental, or sub-segmental pulmonary arteries depending on the size of the embolus. However, this rate can be decreased to 2%-11% if measures are taken timely and correctly. Although PE is not always fatal, it is the third most threatening disease with at least 650 000 cases occurring annually [104].
CT pulmonary angiography (CTPA) is the primary means for PE diagnosis, wherein a radiologist carefully trace each branch of the pulmonary artery for any suspected PEs. However, in general, CTPA consists of hundreds of images. Each image represent one slice of the lung, and the differentiation of PE with high clinical accuracy is time-consuming and difficult. The diagnosis of PE is a complicated task, because many reasons may result in wrong diagnosis, such as high false-positive results. For instance, respiratory motion, flow-related, streak, partial volume, and stair-step artifacts, lymph nodes, and vascular bifurcation could affect the diagnosis. Thus, computeraided detection (CAD) is an important tool for radiologists in the detection and diagnosis of PE accurately and decreasing the reading time of CTPA (Table 3).
Matteo Rucco introduced an integrative approach based on Q-analysis with machine learning [95]. The new approach, called Neural Hypernetwork, has been applied in a case study of PE diagnosis, involving data from 28 diagnostic features of 1427 people considered to be at risk   [105][106][107] or vessel segmentation [108,109], (2) generation of a set of PE candidates within the VOI using algorithms, such as tobogganing [110] and extracting hand-crafted features from each PE candidate [111,112], and (3) computation of a confidence score for each candidate by using a rule-based classifier, neural networks and a nearest neighbor [106,108,113] or multi-instance classifier [110]. Jinbo Bi [96] proposed a new classification method for the automatic detection of PE. Unlike almost all existing PEs search space methods that require vascular segmentation, this method is based on Toboggan's candidate generator, which can quickly and effectively retrieve the suspicious areas of the whole lung. The network provides an effective solution for the learning problem of multiple positive examples to indicate that the action is in progress. The detection sensitivity of 177 clinical cases was 81%. Nowadays, the neural network method has achieved much attention in PE recognition [114][115][116]. Scott et al. proved that radiologists can improve their interpretations of PE diagnosis by incorporating computer output in formulating diagnostic prediction [117]. Agharezaei et al. used the artificial neural network (ANN) for the prediction of the risk level of PE [97]. Serpen et al. confirmed that knowledge-based hybrid learning algorithms are configured for providing better performance than the pure empirical mechanical learning algorithms that provide automatic classification tasks associated with medical diagnosis described as PE. A considerable expertise in the PE domain is considered, and the hybrid classifier of knowledge is easily utilized based on both illustration and experience learning [98]. Tsai et al. proposed the multiple active contour models, which combine the tree hierarchy to obtain the regional lung and vascular distribution. In the last step of the system, the gabor neural network (GNN) was used to determine the location of the thrombosis. This novel method used the GNN network for recognizing PE, but the accuracy and precision of the results are not good [99]. Tajbakhsh et al. investigated the possibility of a unique PE representation, coupled with CNNs, thus increasing the accuracy of PE CAD system for PE CT classification [100]. To eliminate the false-positive detection for the PE recognition, the possibility of implementing neural network as an effective tool for validating CTPA datasets has been investigated [118]. In addition, it improved the accuracy of PE recognition to 83%. Meanwhile, the vessel-aligned multi-planar image representation had three advantages that can improve the PE accuracy. First, the efficiency of the image representation is high, because it is a brief summary of 3D context information near the blockage in two image channels. Second, the image representation consistently supports data enhancement for training the CNN. Therefore, the import extensions can be posted. Third, the image representation is expandable, because it naturally supports data augmentation for training CNN. Besides, Chen et al. evaluated the performance of the deep learning CNN model, comparing it with a traditional natural language processing (NLP) model in extracting PE information from the thoracic CT reports from two institutions and proved that the CNN model can classify radiology free-text reports with an accuracy equivalent to or outperform that of an existing traditional NLP model [101].

Pneumonia
Pneumonia is one of the main causes of death among children. Unfortunately, in rural areas in developing countries, infrastructure and medical expertise are lacking for its timely diagnosis. The early diagnosis of interstitial lung disease is essential for treatment. Therefore, chest Xray examination is one of the most commonly used radiological examinations for the screening and diagnosis of many lung diseases. However, the diagnosis of pneumonia in children by using X-ray is a very difficult task, because the current type of pneumonia image discrimination relies mainly on the experience of doctors. Specialized departments and personnel from hospitals are required for making judgments. This set-up is laborious, and considering that the images of some pneumonia are very similar, doctors can easily make mistake, causing misdiagnoses (Table 4).
Pneumonia usually manifests as one or more opaque areas on the chest radiograph (CXR) [126]. However, the diagnosis of CXR pneumonia is complicated by many other diseases in the lungs, such as fluid overload (pulmonary edema), bleeding, volume loss (atelectasis or collapse), lung cancer, or post-radiation or surgical changes. Generally, medical images are viewed, and a rough estimation of the observed tissue is made to distinguish whether the tissue is normal. In recent decades, the identification of pneumonia has developed rapidly through the computer-assisted technology. The technique pays attention to deep learning. However, many methods are available based on the traditional image pattern recognition. The template matching and learning method based on the statistical mode is one example. Siemens used a template matching algorithm [127] to identify the type of pneumonia. In this work, images were converted from the spatial domain to the frequency domain via Fourier transform infrared spectroscopy, and the target features in the frequency domain were used to classify the pneumonia type. However, the algorithm is computationally intensive and the accuracy is low. A conventional image processing method based on a statistical model can also be used. It generally extracts features manually and then uses a classifier for identification. In general, the modeling is based on the color, texture, shape, and spatial relationship of the image. The algorithms commonly used for texture features are local binary patterns (LBP) mode and direction histograms of oriented gradients (HOG) [128,129].
However, the features extracted by hand cannot accurately distinguish the types of pneumonia. As a recently developed method for the automatic feature extraction, deep learning has been applied to some medical image analysis. The network avoids the complex preprocessing of the images in the early stage and can directly input the images into the network to obtain the recognition results. CNNs can learn the essential characteristics of different pneumonia types autonomously through the convolution kernels. Abdullah et al. [120] proposed a detection method for the pneumonia symptoms by using the CNNs based on the difference of gray-scale color and the segmentation between normal and suspicious lung regions. Correa [121,130] introduced a method of automatic diagnosis of pneumonia by pulmonary ultrasound imaging. Different from Refs. [131] and [130], Cisnerosvelarde et al. [122] applied pneumonia detection in a new field based on ultrasound videos rather than ultrasound images. To determine an automated diagnostic method for medical images, 40 simulated chest CXRs related to the normal and pneumonia patients were studied [123]. For the detection of pneumonia clouds, the healthy part of the lungs was isolated from the area of pneumonia infection. Then, the algorithm for clipping and extracting lung regions from images was also developed and was compiled based on CUDA [131][132][133] for an improved computational performance [124].
The scarcity of data and the dependence on the labeled data in the application of deep learning in medical imaging have been analyzed. Wang et al. [125] aimed to build a large-scale and high-accuracy CAD system for increased academic interest in building large-scale database of medical images. The author extracted the report contents and tags from the picture archiving and communication system (PACS) of the hospital by NLP and constructed a hospital-scale chest X-ray database.

Tuberculosis
Pulmonary tuberculosis is a chronic infectious disease mainly transmitted by the respiratory tract [35]. Pulmonary tuberculosis is caused by individual factors such as age, genetic factors, and personal behaviors such as smoking and air pollution. Its pathogen is Mycobacterium tuberculosis, which can invade the body and cause hematogenous dissemination. At present, the diagnostic methods of tuberculosis mainly depend on historical records, symptoms and signs, imaging diagnosis, and the sputum Mycobacterium tuberculosis examination. The chest Xray examination is an important method for the diagnosis of tuberculosis. It can detect early mild tuberculosis lesions and judge the nature of the lesions.
The success of the method depends on the radiologist's CAD system, which can overcome this problem and accelerate the active case detection (Table 5). In recent years, great progress has been made in the field of deep learning, which allowed the classification of heterogeneous images [134,135]. CNN is popular for its ability to learn intermediate and advanced images. Various CNN models have been used for the classification of CXR into tuberculosis [136]. Lakhani & Sundaram [137] used deep learning with CNNs and achieved accurately classified tuberculosis from the CXR with an area under the curve of 0.99. Melendez et al. [138] evaluated the deep learning framework on a database containing 392 patient records with suspected TB subjects. Melendez [139][140][141] proposed the use of weakly labeled approach for TB detection. It studied an alternative pattern classification method, namely multi-instance learning, which does not require detailed information for training a Wang et al. [125] 2017 Classification X-ray Built a large-scale and high-accuracy CAD system CAD system. They have applied this alternative method to a CAD system designed for the detection of texture lesions associated with tuberculosis. Then, for solving the problem of having to use additional clinical information in screening for tuberculosis, a combination framework based on machine learning has been proposed [141,145,146]. Zheng et al. [142] studied the performance of the known deep convolution network (DCN) structures under different abnormal conditions. In comparison with the deep features, the shallow features or the early layers always provide higher detection accuracy. These techniques have been applied for tuberculosis detection on different datasets and achieved highest accuracy. For classifying abnormalities in the CXRs, a cascade of CNN and recurrent neural network (RNN) have been employed Indiana chest X-rays dataset [140]. However, the accuracy was not compared with previous results. The use of a binary classifier scheme of normal versus abnormal has been attempted [143].

Interstitial lung disease (ILD)
ILD is a group of heterogeneous non-neoplastic and non-infectious lung diseases with alveolar inflammation and interstitial fibrosis as the basic pathological changes. This disease also called diffuse parenchymal lung disease (DPLD). ILD involves several abnormal imaging patterns observed in CT images. The accurate classification of these patterns plays an important role in the accurate clinical judgment of the extent and nature of the disease (Table 6). Therefore, the development of an automatic computer-aided detection system for lung is important.
Anthimopoulos et al. [55] proposed and evaluated a CNN for the classification of ILD patterns. This method used the texture classification scheme of the ROI for the generation of an ILD quantization map of the whole lung by sliding a fixed proportion classifier on the presegmented lung field. Then, the quantified results were Melendez et al. [139] 2014 Detection X-ray Proposed a method which uses a weakly labeled approach to detect TB Shin et al. [140] 2016 Detection X-ray Presented a deep learning model to detect a disease from an image and annotate its contexts Murphy et al. [141] 2019 Detection X-ray Automated analysis of chest X-ray (CXR) as a sensitive and inexpensive means of screening susceptible populations for pulmonary tuberculosis Zheng et al. [142] 2017 Detection X-ray Found that shallow features or early layers always provide higher detection accuracy Bar et al. [143] 2015 Detection X-ray Explored the ability of CNN learned from a nonmedical dataset to identify different types of pathologies in chest X-rays Gao et al. [152] 2018 Classification CT Proved that the use of three attenuation ranges data can enhance the classification effect used in the final diagnosis of the CAD system. Simonyan and Zisserman [147] developed a CNN framework to classify the lung tissue patterns into different classes such as normal, reticulation, GGO, and honeycombing. Li et al. [148] used an unsupervised algorithm to capture image features of different scales and feature extractors of different sizes and achieved a good classification accuracy of 84%. Then, Li et al. [149] designed a customized CNN with a shallow convolution layer to classify ILD images. Gao et al. [150] proposed two variations of multi-label deep CNNs to accurately recognize the potential multiple ILD co-occurrence on an input lung CT slice. Christodoulidis et al. [151] applied algorithms similar to knowledge maps to the classify ILD. In this study, the possibility of transfer learning in the field of medical image analysis and the structural nature of the problem were expressed. The training method of the network is as important as the design of that architecture. By rescaling the original CT images of Hounsfield units to three different scales (one focusing on the low attenuation mode, one focusing on the high attenuation mode, and one focusing on the normal mode), the three 2D images were used as input into the network. Gao et al. [152] found that the three attenuation ranges provided a better visibility or visual separation in all six ILD disease categories.

Others
For other pulmonary diseases including common diseases, such as pneumothorax, bullae, and emphysema, deep learning models have many applications, which greatly improve the diagnostic rate of etiology. Cengil et al. [153] used deep learning for the classification of cancer types. A semi-supervised deep learning algorithm was proposed to automatically classify patients' lung sounds [154,155] (for the two most common lung sounds, wheezing and bursting). The algorithm made some progress in automatic lung sound recognition and classification. Aykanat et al. [156] proposed and implemented a U-net convolution network structure for the biomedical image segmentation. It mainly separates lung regions from the other tissues in the CT images. To facilitate the detection and the classification of lung nodules, Tan et al. [157] used a CAD system based on transfer learning (TL) and improved the accuracy of lung disease diagnosis in bronchoscopy. For chronic obstructive pulmonary disease (COPD) [158], the characteristics of long-term short-term memory (LSTM) unit are used for representing the progress of COPD, and a specially configured RNN was used for capturing irregular time-lapse. It improved the explanatory ability of the model and the accuracy of estimating the progress of COPD. Campo et al. [159] used X-rays to quantify the emphysema instead of CT scans.

Datasets and performance
Pulmonary nodule datasets

LIDC-IDRI
The Lung Image Database Consortium image collection (LIDC-IDRI) [160] consists of chest medical image files (such as CT and X-ray) and the corresponding pathological markers of the diagnostic results ( Table 7). The data were collected by the National Cancer Institute to study early cancer detection in high-risk populations. The dataset contains 1018 research cases, and the nodule diameter in the LIDC-IDRI dataset ranged from 3 mm to 30 mm. For each data, four experienced thoracic radiologists carried out two-stage diagnostic labeling. In the first stage, each radiologist independently examined each CT scan and marked one of three types of lesions ("nodule ≥ 3 mm," "nodule < 3 mm," and "non-nodule > 3 mm"). On the second phase, each radiologist independently checks his or her own markers and the anonymous markers from three other radiologists to provide final comments. This procedure aims to identify all pulmonary nodules in each CT scan as completely as possible without compulsory consistency. A brief comparison is given in the LIDC-IDRI dataset. Armato et al. [161] believed that better results can be obtained by combining geometric texture with the directional gradient histogram with reduced HOG-PCA features to create a hybrid feature vector for each candidate node. Huidrom et al. [162] used a nonlinear algorithm to classify the 3D nodule candidate boxes. The proposed algorithm is based on the combination of genetic algorithm (GA) and the particle swarm optimization (PSO) to prove the learning ability of multi-layer perceptron. This method was compared with the existing linear discriminant analysis (LDA) and the convolutional neuron methods. Shaukat et al. [163] presented a marker-controlled watershed technique that used intensity, shape, and texture features for the detection of lung nodules. Zhang et al. [164] used 3D skeletonization features based on the prior anatomical knowledge for the determination of the lung nodules. Naqi et al. [165] used traditional manual feature HOG and CNN features to construct hybrid feature vectors to find candidate nodules. Refs. [166][167][168] showed algorithms that achieved better results that year. The deep learning methods [71,[169][170][171][172] for lung nodule detection did not show promising results.

LUNA16
LUNA16 dataset [160,176] is a subset of LIDC-IDRI. LIDC-IDRI that includes 1018 low-dose lung CT images, while LUNA excludes CT images with slices thicker than 3 mm and pulmonary nodules smaller than 3 mm. The database is very heterogeneous. It is clinically collected from seven different academic institutions for dose and low-dose CT scans, and it has a wide range of scanner models and acquisition parameters. The final list contains 888 scans. Dou et al. [175] employed 3D CNNs for false positive reduction in automated pulmonary nodule detection from volumetric CT scans. Setio et al. [57] used multiview convolutional networks (ConvNets) to extract the features and then combined a dedicated fusion method to obtain the final classification. Other teams [159,161,177] also achieved relatively good results.

Pneumonia datasets
Chest X-ray images The dataset released by the National Institutes of Health includes 112 120 frontal-view X-ray images of 30 805 unique patients [178]. Fourteen different chest pathological markers were labeled using the NLP method in the Journal of Radiology. As a positive example, pneumonia images were identified, and as a negative example of the subject of pneumonia detection, all other images were summarized. The database contains more than 100 000 Xray front views (about 42 g) of 14 lung diseases (atelectasis, consolidation, infiltration, pneumothorax, edema, emphysema, fibrosis, effusion, pneumonia, pleural thickening, cardiac hypertrophy, nodules, masses, and hernia). Researchers used NLP to label the data. Grades 1 -14 correspond to 14 kinds of lung diseases, and grade 15 represents 14 kinds of lung diseases. The accuracy of tags in this database exceeded 90%. Wang et al. [125] proposed a weakly-supervised multi-label image classification and disease localization framework and achieved F1 score of 0.633. Yao et al. [179] used LSTMs to leverage interdependencies among target labels in predicting 14 pathologic patterns and got F1 score of 0.713. Rajpurkar et al. [178] improved the result to 0.768 by using a 121layer DCN (DenseNet).

Tuberculosis datasets
Shenzhen Hospital X-ray Shenzhen Hospital X-ray [180] is a dataset collected by Shenzhen Third People's Hospital, Guangdong Medical University, Shenzhen, China. Chest X-rays from the clinic were captured as part of the daily hospital routine for a month, mostly in September 2012. This dataset contains 626 frontal chest X-rays, in which 326 are normal, and 336 are accompanied by symptoms of TB. All data were provided in PNG form, which vary in size but are all around 3k Â 3k pixels. Montgomery County X-ray The Montgomery County X-ray dataset [180] consists of 138 frontal chest X-rays from the TB screening program in the Department of Health and Human Services, Montgomery County, Maryland, USA. In addition, 80 patients were in normal condition and 58 patients had imaging symptoms of tuberculosis. All pictures were captured using the conventional X-ray machine (cr) to store 12-bit gray level images in the form of portable network graphics (png). They can also be used in the form of DICOM as required. The size of the X-ray is 4020 Â 4892 or 4892 Â 4020 pixels. The work [136] tested deep learning methods on the detection of tuberculosis based on this dataset, and the Shenzhen dataset achieved an accuracy of more than 80% that is comparable performance to the radiologists.

Geneva database
Geneva database was collected by the University Hospitals of Geneva, Geneva, Switzerland. The dataset consists of chest CT scans of 1266 patients between 2003 and 2008 in the University Hospitals of Geneva. Based on the EHR information, only cases with HRCT (without contrast agent, 1 mm slice thickness) were included. Up until now, more than 700 cases were revised and 128 were stored in the database that affected one of the 13 histological diagnoses of ILDs. The database is available for research on request and after the signature of a license agreement. Anthimopoulos et al. [173] and Gangeh et al. [174] improved the quantitative measurement of the ILD based on Geneva database.

Challenges and future trends
From the medical and clinical aspects, despite the successes of deep learning technology, many limitations and challenges exist. Deep learning generally requires a large amount of annotated data for analysis. This requirement is a big challenge for annotating medical images. Labeling medical images require expert knowledge, such as the domain knowledge of radiologists. Hence, annotating sufficient medical image is labor-and time-consuming. Although the annotation of medical images is not easy, the amount of unlabeled medical images is vast, because they are well stored in PACS for a long time. If the unlabeled images can be utilized by deep learning techniques, considerable time and effort in annotation would be saved. Another challenge is the interpretability of deep learning [181]. Deep learning methods are often taken as black box, where their performance or failure is hard to interpret. The demands for the investigation of these techniques increase to pave the way for clinical application of deep learning in medical image analysis. From the perspective of law, the wide spread of deep learning application in medical field would also require transparency and interpretability.
Our future work will further analyze the problem of image semantics segmentation based on the deep learning network, and summarize and improve the shortcomings in the research. Under the background of the research of medical imaging based on deep learning, this paper puts forward several potential or under-study directions in the future. (1) Neural network has a good classification effect on independent and identically distributed test sets, but examples of error classification added to the model, which are not very visually different, will cause a great difference in neural network. Therefore, the Adversarial Net [147] has been proposed to determine a method that can result in higher resolution of medical images based on human eyes.
(2) Common methods of machine learning includes supervised learning and unsupervised learning. The current research is based on supervised learning algorithms. However, supervised learning requires human label classification and network training of the data, which can greatly consume the time of medical experts. Senior medical experts often do not have much time to label the training data of a certain order of magnitude. Unsupervised learning may be a potential research direction of medical image processing in the future.

Conclusions
Medical image processing based on deep learning is a hot and challenging subject intersecting the medical field and the computer field. This paper summarizes the research work carried out in the following direction. First, the recent popular DNN framework was introduced, and the origin of its neural network was traced back and discussed in detail. In addition, toward the current deep network framework, the classical models that are universally applied to medical images were introduced.
In the third part of this paper, the application of neural network in various lung diseases was introduced. For the tasks of different diseases, this paper describes the current research status of deep neural network in medical images, analyses and summarizes the development of the framework, and makes a detailed analysis of the models that have achieved good results in these fields to lay an important research foundation for researchers afterwards.
In the fourth part of the article, various algorithm models on datasets such as LIDC-IDRI and LUNA16 were introduced in detail. In addition, some commonly used datasets on other diseases were briefly introduced in this paper, so that others can carry out relevant experiments.

Compliance with ethics guidelines
Jiechao Ma, Yang Song, Xi Tian, Yiting Hua, Rongguo Zhang, and Jianlin Wu declare that they have no conflicts of interest. This manuscript is a review article that does not need a research protocol requiring approval by the relevant institutional review board or ethics committee.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.