1 Introduction

Pneumoconiosis is an occupational disease that mainly develops through the long-term inhalation of large amounts of industrial dust into the lungs, causing diffusive fibrosis in lung tissues during occupational activities. According to occupational disease statistics from 2012 to 2019 from the Chinese Ministry of Health, as shown in Table 1, pneumoconiosis accounts for more than 80% of all occupational disease cases, with an ever-high cumulative number of diagnosed cases.

Table 1 Statistics of occupational diseases and pneumoconiosis from 2012 to 2019 in China

The latest diagnostic criteria of pneumoconiosis are based on high kilo-voltage (HKV) chest X-rays and digital radiography (DR) [1] combined with a comprehensive analysis of the patient’s dust exposure history, clinical manifestations and assisted examinations. Currently, the diagnosis of pneumoconiosis mainly depends on the experience of radiologists, which is limited and complex.

With the development of artificial intelligence technology and medical image research, medical image-aided diagnosis/detection technology has been gradually applied in the clinic and pneumoconiosis can be prevented through pre-screening. In this study, we use multi-scale feature mapping technology based on deep learning to pre-screen radiographs quickly and accurately in the aided diagnosis of pneumoconiosis. The main contributions of this study are as follows:

1. We designed a multi-scale feature mapping model for detecting pulmonary fibrosis, this model cannot only detect fibrosis, but also locate the prediction boxes. The results on testing dataset and validation dataset showed that the mode is better than two other researchers for recognition of pulmonary fibrosis.

2. We analysed the diagnostic standard of pneumoconiosis staging and designed the discrimination method of pneumoconiosis staging, which used the threshold value of the pixel ratio of the area of small opacities to the lung zone. This study monitored the discrimination parameters through the simulation experiment of DR of real patients, the results indicated that the discrimination method was feasible and consistent with that of the expert, the method could be applied in clinic and the diagnosis accuracy is higher than average level of clinicians.

The rest of this paper is organized as follows: in Sect. 2, the related works are reviewed; in Sect. 3, the detail of the proposed method is described; in Sect. 4, the experimental details and the advanced performance obtained experimentally on the dataset are presented; in Sect. 5, conclusions are drawn.

2 Related Work

Deep learning has been widely used to detect lesions on medical images and has achieved the detection accuracy of clinical experts in many cases. Searching for or locating abnormal or suspected areas via CAD to attract the attention of clinicians will increase the detection rate of lesions while reducing the number of false negatives that may be caused by subjectivity. As the main basis of pneumoconiosis diagnosis, DR is a type of digital medical image that is more suitable for computer acquisition and research than HKV chest X-rays. Pathologically, pneumoconiosis is mostly divided into two types: fibrosis and non-fibrosis. Approximately 90–95% of silicosis and coal worker pneumoconiosis cases are pulmonary fibrosis types. Thus, in this study, we focused on the pathological manifestations most common to silicosis and coal worker pneumoconiosis for CAD of pulmonary fibrosis.

The study of the CAD of pneumoconiosis began in the 1970s. Savol [2] adopted the AdaBoost algorithm on the chest X-rays of coal mine workers to detect small opacities of pneumoconiosis. Yu et al. [3] proposed a scheme of CAD for pneumoconiosis based on chest radiographs and showed that their automatic detection scheme was more accurate and convenient and could be a great help in the clinical screening of pneumoconiosis. Later, they improved the algorithm to detect pneumoconiosis by obtaining the feature vector of DR through the grey-level histogram co-occurrence matrix (GLCM). Zhu [4, 5] combined a decision tree (DT) and a support vector machine (SVM) to classify the DR of 85 healthy individuals and 40 patients and the pneumoconiosis recognition accuracy achieved the clinical expert level. Cai et al. [6] extracted features from 66 chest radiographs to distinguish between healthy individuals and pneumoconiosis patients and used another 29 images to evaluate the diagnostic performance of the classifier, achieving an overall accuracy of 79.3%. Haifeng [7] proposed a method to automatically recognize different stages of pneumoconiosis through the CAD of DR and adopted the back propagation (BP) neural network (NN) classifier to classify the feature vector based on the features and artificial discrimination of the density of the lung field opacity on chest radiographs, achieving a classification rate of 100% and an average correctness rate of 68.3%. Jin [8] focused on the object between the ribs to achieve the CAD of pneumoconiosis using chest X-rays. A total of 150 chest X-ray images were tested and the results showed that the correct classification rate, true positive rate and true negative rate were 91%, 96% and 86%, respectively; thereby, demonstrating the clinical application of assisting radiologists in diagnosis. Okumura [9] developed a CAD system of a rule-based artificial neural network (ANN) for detecting the presence or absence of pneumoconiosis on chest X-rays. The results showed that the method was significantly better than the ANN method or the rule-based method alone, indicating that the method could help radiologists classify pneumoconiosis.

Because the diagnostic standard of pneumoconiosis involves factors such as the lung field and the pulmonary opacity density, it is necessary to quantify the stages of fibrosis. Therefore, we studied the object detection strategy to explore the accurate identification of different scales of opacity of pulmonary fibrosis and pneumoconiosis staging methods.

We designed a multi-scale feature mapping model, with the entire radiograph as the input and an eight-layer fully convolutional neural network (FCNN) [10, 11] for feature extraction to achieve end-to-end detection. To reduce the costs of time and storage for training, we performed uniform dense sampling at different locations of the radiograph using different scales and aspect ratios, as well as classification and regression on the feature mapping layer after the convolution layer. We adopted the non-maximum suppression (NMS) algorithm to select the candidate box of fibrosis opacity with the highest confidence as the final detection result. Finally, we quantitatively estimated the sum of the detection windows for identified pulmonary fibrosis with the lung field for automatic pneumoconiosis classification.

3 Methods

3.1 Multi-scale Feature Mapping Model

The model contains two modules. One is a basic module for feature extraction and the other is a multi-scale feature mapping module for lung fibrosis detection (Fig. 1). The two modules share computing resources and implement end-to-end lesion detection. The feature extraction module is a network composed of eight convolutional layers and the multi-scale feature mapping module utilizes different convolution layers to perform object detection at different scales, while each layer locates objects at a certain scale due to the distinct receptive fields in the different convolution layers with different pulmonary fibrosis opacity sizes [12].

Fig. 1
figure 1

The detection model of multi-scale feature mapping

The higher convolution layers of the model are used to detect the multi-scale object by gradually reducing the feature mapping size. Each feature mapping layer uses a set of convolution kernels to predict a fixed object, while the convolutional feature layer predicts some fixed-size object bounding boxes and scores the boxes. Finally, the final detection box is generated using the NMS algorithm [13,14,15]. For an m × n feature mapping layer, there are two basic elements: a 1 × 1 convolution kernel that predicts the parameters of the potential object and the offset of the predicted bounding box, which is generated from the initial bounding box. When applied to each location of the m × n feature layer, the convolution kernel generates an output value. For each feature mapping layer, the output value of the predicted bounding box is calculated using the offset from the initial bounding box. For each feature mapping element, the offset relative to the initial bounding box is predicted and the score of the box belonging to a certain category is also predicted. Specifically, for any given location, k initial bounding boxes are set according to the scale and aspect ratio of the bounding box. The scores of the initial bounding box belonging to c categories and the four offsets relative to the initial bounding box are calculated. Therefore, for an m × n feature map, the model calculates \((c + 4) \times k \times m \times n\) predicted bounding boxes (Fig. 2).

Fig. 2
figure 2

Multi-scale object detection method

The spatial resolution of the lower layers gradually decreases, which leads to the loss of some spatial information, but a smaller receptive field can be better matched to small opacities. Although higher network layers lose the spatial information of some opacities, they have a larger receptive field that can be matched to large opacities (Fig. 3). When fused, the object detectors of the feature mapping layers across multiple scales form an effective detector.

Fig. 3
figure 3

Multi-scale feature mapping module

3.2 Loss Function

In the training process, it is usually necessary to determine the initial bounding box corresponding to the real object. For each real bounding box, the initial bounding box is selected based on its position, aspect ratio and intersection over union (IOU) and then the initial bounding box and the ground-truth (GT) bounding box are matched by optimizing the IOU value. In this study, once the matched IOU value exceeded the threshold, it was estimated as a real object. Meanwhile, to simplify the training process, the trained network was allowed to predict multiple overlapping initial bounding boxes with high scores instead of a single maximum overlapping bounding box [16]. The parameter W of the multi-scale feature mapping model was trained from a series of samples, i.e. \(S = \{ (X_{i} ,Y_{i} )\}_{i = 1}^{N}\), in which \(X_{i}\) is the labelled sample box, \(Y_{i} = (y_{i} ,b_{i} )\), where \(y_{i} \in \{ 0,1,...,K\}\) is the sample label; and \(b_{i} = (b_{i}^{x} ,b_{i}^{y} ,b_{i}^{w} ,b_{i}^{h} )\) represents the centre point (x, y) and the width and height (w, h) of the coordinate. The loss function is defined as follows:

$$ L(W) = \sum\limits_{n = 1}^{N} {\sum\limits_{{i \in S^{n} }} {\alpha_{n} l^{n} (X_{i} ,Y_{i} |W)} } , $$
(1)

where N is the number of feature mapping layers; \(\alpha_{n}\) is the weight of the loss function \(l^{n}\); and \(S = \{ S^{1} ,S^{2} ,...,S^{N} \}\) is the sample of feature mapping layer m, i.e. the error of the mth detection layer selected by the scale. After the object classification and the coordinates of the predicted bounding boxes are jointly trained, the detection results of each feature mapping layer are combined. The final loss function contains two objects:

$$ l(X,Y|W) = L_{obj} (p(X),y) + \lambda L_{loc} (b,\hat{b}), $$
(2)

where \(p(X) = (p_{0} (X),...,p_{K} (X))\) is the probability value of object classification; \(\lambda\) is the balance parameter and is generally set to 1 through cross-validation; \(L_{obj} (p(X),y) = - \log p_{y} (X)\) is the cross-entropy function (for calculating the classification); and \(\hat{b}_{i} = (\hat{b}_{i}^{x} ,\hat{b}_{i}^{y} ,\hat{b}_{i}^{w} ,\hat{b}_{i}^{h} )\) is the coordinate offsets of the predicted box. Thus, the coordinate loss function is as follows:

$$ L_{loc} (b,\hat{b}) = \sum\limits_{i \in Pos}^{N} {\sum\limits_{m \in (x,y,w,h)} {smooth_{L1} (b_{i}^{m} - \hat{b}_{j}^{m} )} }, $$
(3)

where i is the coordinates of the predicted object box and j is the coordinates of the ground-truth box. The coordinate loss function calculates the coordinate errors between the predicted object box and the ground-truth box through \(smooth_{L1} ()\) and \(\hat{b}_{j}^{m}\) is calculated using the following equation:

$$ \hat{b}_{j}^{m} = \left\{ \begin{gathered} \hat{b}_{j}^{x} = \left(b_{j}^{x} - d_{i}^{x} \right)/d_{i}^{w} \hfill \\ \hat{b}_{j}^{y} \left(b_{j}^{y} - d_{i}^{y} \right)/d_{i}^{h} \hfill \\ \hat{b}_{j}^{w} = \log \left(\frac{{b_{j}^{w} }}{{d_{i}^{w} }}\right) \hfill \\ \hat{b}_{j}^{h} = \log \left(\frac{{b_{j}^{h} }}{{d_{i}^{h} }}\right) \hfill \\ \end{gathered} \right.. $$
(4)

3.3 Subdivision of the Lung Field

Automatic segmentation of the lung field is a very important step and the quality of segmentation directly affects the accuracy of the automatic diagnosis of pneumoconiosis staging. According to the diagnostic standard [1], each lung field is divided into three zones, namely, a high lung field, a middle lung field and a low lung field; therefore, there are six lung zones in the left and right lobes combined. In Fig. 4, the left is the original DR and the middle is the segmentation of the lung field, which was trained through U-Net [17]. The right is the segmentation overlying DR. We first calculated the vertical distance between the apex of the lungs and the diaphragm from the lung field segmented in the original DR and then we calculated the contour lines based on the horizon of the lung zones, as shown in Fig. 4b.

Fig. 4
figure 4

a Segmentation of the lung field and b subdivision of the lung fields

3.4 Discriminant of Pneumoconiosis Staging

3.4.1 Small Opacity and Large Opacity

According to the shape, small opacities were divided into two types (circular and irregular) and four levels, \(j = (1,2,3,4)\). We used p or s to refer to the diameters (width) not exceeding 1.5 mm as j = 1, q or t to refer to the diameters (width) between 1.5 mm and 3 mm as j = 2, r and u to refer to the diameter (width) between 3 and 10 mm as j = 3 and opacities with a diameter (width) of more than 10 mm were called large opacities as j = 4.

3.4.2 Small Opacity Density

The diagnosis of pneumoconiosis requires first determining its opacity density based on the shape of the opacity and then calculating the overall density, which is the highest density of all lung zones. Finally, the pneumoconiosis staging is comprehensively determined based on the number of zones that have opacities. Therefore, assuming there were \(i \in (1,2,...6)\) lung zones. We adopted the threshold evaluation method to determine small opacity densities by calculating the ratio k of the small opacity pixels with varying densities to the pixels of the lung zone where the opacities resided to represent the small opacity density level of each lung zone. The following equation was used:

$$ k_{i}^{j} = \frac{{\sum {B_{i}^{j} } }}{{N_{i} }}, $$
(5)

where \(\sum {B_{i}^{j} }\) is the sum of pixels of the ith lung zone belonging to opacity level j; \(N{}_{i}\) is the pixel value of the ith lung zone; and \(k_{i}^{j}\) is the discriminant value indicating that the small opacity density of the ith lung zone belongs to j.

3.4.3 Discriminant Method of Pneumoconiosis Staging

Radiologists use two factors to determine the pneumoconiosis staging: first, the distribution range of small fibrotic opacities refers to the number of lung zones containing small opacities with a level 1 density or above; otherwise, the zones containing opacities with a level lower than 1 density are not counted in the distribution range of the lesion. Second, the area of small opacities needs to take up at least two-thirds of the lung zone; otherwise, it is not counted in the distribution zone of the lesion. Therefore, the \(k_{i}^{j}\) threshold was set to 0.67 and the discriminative parameters of pneumoconiosis staging included the highest opacity density level J with a k value above 0.67 and the number of lung zones with a given opacity (Table 2).

Table 2 Discriminative parameters of pneumoconiosis staging

4 Experiment and Results

4.1 Experimental Datasets

The pneumoconiosis DR datasets include two parts: the public Chest X-ray14 dataset and the dataset collected by the Anhui Prevention and Treatment Center for Occupational Diseases.

4.1.1 Chest X-ray14

Chest X-ray14 [18, 19] is a chest X-ray database built by the National Institutes of Health (NIH) Clinical Center. It contains 11,210 anterior chest X-ray images from more than 30,000 individual patients and text labels are given for one or more out of 14 diseases through natural language processing (NLP) based on the corresponding radiology report (in some cases, one image may have multiple disease labels). The database can be freely accessed and downloaded by researchers from all over the world for use in research on the aided diagnosis of clinical diseases. The statistics of the database are shown in Table 3.

Table 3 Chest X-ray 14 dataset

We chose 1686 images with lung fibrosis and each image was labelled with a rectangular box showing the lung fibrotic zone.

4.1.2 Self-Collected Dataset

The other dataset includes 250 pneumoconiosis DR samples collected by the Anhui Prevention and Treatment Center of Occupational Diseases; of which, 50 were from patients diagnosed with category I pneumoconiosis, 40 were from patients diagnosed with category II pneumoconiosis, 50 were from patients diagnosed with category III pneumoconiosis and 50 were from patients diagnosed with category IV pneumoconiosis. Another 60 unannotated DRs were used as the validation dataset. These DRs were verified by radiologists and some diagnoses had already been confirmed, showing that the dataset has extremely high accuracy and reliability. The details are shown in Table 4.

Table 4 Self-collected dataset

4.2 Experimental Model Settings

Based on the imaging standards of DR, the DR is 1024 × 1024 with a 356 mm × 356 mm view field, so the spatial resolution of the standard DR is 0.348 mm between two adjacent pixels. Therefore, a small opacity with a diameter (width) of 1.5 mm (j = 1) could be formed by 4 × 4 pixels, a small opacity with a diameter (width) of 3 mm (j = 2) could be formed by 8 × 8 pixels and a small opacity with a diameter (width) of 10 mm (j = 3) could be formed by 28 × 28 pixels. Once the number of pixels in the diameter and width directions of the detected object box exceeded 28 pixels, we discriminated that a large opacity (j = 4) was present in the box and category IV pneumoconiosis was directly diagnosed, while for the other categories, DR required further examination. Accordingly, in the experimental model, the four scales of the feature mapping layers, 256 × 256, 128 × 128, 64 × 64 and 16 × 16 and the corresponding convolution layers conv4, conv5, conv6 and conv7, respectively, were used to predict the position and classification scores and feature extraction was performed with 1 × 1 convolution kernels. The model was trained using the stochastic gradient descent (SGD) algorithm with the following parameters: the learning rate was 0.001, the momentum was 0.9, the weight delay was 0.0005, the batch size was 32 and the number of iterations was 10,000. The parameters of the feature mapping layer were randomly initialized.

4.3 Aspect Ratio of Detection Boxes

Given the common features of the lesion area, the parameters were shared among all feature layers and detection was performed using the lower and upper mapping layers, e.g. using 8 × 8 and 4 × 4 feature maps, respectively, to reduce the computational workload. In the model, the feature mapping layers have different receptive fields and the initial bounding box does not need match the receptive fields of each layer. Assuming that m feature mapping layers are used for prediction, the aspect ratio of the initial bounding box of each layer can be calculated from the following equation:

$$ s_{k} = s_{\min } + \frac{{s_{\max } - s_{\min } }}{m - 1}(k - 1),k \in [1,m], $$
(6)

where \(s_{\min }\) is 0.2 and \(s_{\max }\) is 0.9, representing that the scale factor of the lowest layer is 0.2 and that of the highest layer is 0.9 and the scales of the other layers vary between these two values. These values were selected empirically. For example, we found that opacities with a diameter below 1.5 mm were approximately 4 × 4 pixels, while opacities with a diameter above 10 mm were approximately 28 × 28 pixels. Therefore, we used five different aspect ratios (\(\{ 1:1,1:2,1:3,2:1,3:1\}\)) for the initial bounding box. The length and width were calculated using the following equations:

$$ w_{k} = s_{k} \times \sqrt {a_{r} } ,h_{k} = s_{k} /\sqrt {a_{r} } . $$
(7)

The coordinates of the centre point of the ith row and the jth column of the initial bounding box were set to \(\left( {\frac{i + 0.5}{m},\frac{j + 0.5}{m}} \right)\), where m is the size of the m × m feature map. By fusing all the predicted values of all the initial bounding boxes with different scales and aspect ratios, we obtained a series of prediction scores including various input sizes and shapes.

4.4 Results and Evaluation of Multi-scale Detection

According to the above calculations, opacities with a diameter or length and width over 28 pixels were judged to be large opacities and the opacities with other object boxes were all discriminated as small opacities. Therefore, we chose four dimensions: 16 × 16, 64 × 64, 128 × 128 and 256 × 256. Of them, the 16 × 16 size was used to predict large opacities with a diameter (width) over 10 mm and the 64 × 64 dimension was used to predict small opacities with a diameter (width) of 3–8 mm. The number of predicted object boxes for each feature mapping layer and the scale of the predicted opacities are shown in Table 5. The number of object boxes predicted through conv4 accounted for 75.9% of the total.

Table 5 Number of predicted object boxes

To evaluate the performance of the model in predicted bounding boxes, using the training dataset of 1686 DRs, we experimentally compared our proposed method with the ANN–based aided diagnosis method by Okumura et al. [9] and the energy texture feature–based DT algorithm by Zhu et al. [5] (Fig. 5). The values of area under the curve (AUC) of the receiver operating characteristic (ROC) curve with our proposed method, the Okumura method and the Zhu method were 0.84, 0.74 and 0.72, respectively, which indicated that our proposed method performed better than the others.

Fig. 5
figure 5

Comparison of the performance of the three models

To further evaluate the performance of the proposed model, we conducted model experiments using the training dataset (1686 DR), the testing dataset (190 DR) and the validation dataset (60 DR). We plotted the trends of the accuracy values for the training and validation datasets and the trends of the loss function for the testing dataset separately with 12,000 iterations. As shown in Fig. 6, as the number of iterations increased, the convergence of accuracy improved and the accuracies of the training dataset and validation dataset were 92% and 78%, respectively, while the loss function value of the testing dataset slightly fluctuated when reaching 5% ± 1%. The experimental results also indicate that after 4000 iterations, the accuracy of the validation dataset stabilized and did not improve significantly, which was related to the limited size of the testing dataset.

Fig. 6
figure 6

Trend of loss function and accuracy

4.5 Results of the CAD of Pneumoconiosis Staging

To assess the clinical accuracy of the proposed method, we randomly applied the discriminant method of pulmonary fibrosis to 10 diagnosed DR patients (Fig. 7) from a self-collected dataset while hiding the patient information by monitoring the experimental parameters and setting the overlap threshold of the prediction and the GT box to 0.67. The discriminant parameters of four scales feature mapping (256 × 256, 128 × 128, 64 × 64, 16 × 16)on six lung zones and the diagnosis of the CAD based on the proposed method are shown in Table 6.

Fig. 7
figure 7

10 real patients’ X rays (No.1–10)

Table 6 The discriminant parameters in six different lung zones for four scales and the result of CAD

The CAD for pneumoconiosis staging indicated that the diagnosis was highly consistent with radiologists. To evaluate the proposed method for different pneumoconiosis stages, we compared the AUC values for four pneumoconiosis stages of 190 diagnosed DRs from a self-collected dataset. As shown in Fig. 8, the AUC value of our proposed method with category I was 0.864. The AUC value of category IV was 0.82, followed by category II and category III, with AUC values of 0.81 and 0.65, respectively. The better classification performance was for category I(no pneumoconiosis) and category IV.

Fig. 8
figure 8

The discrimination performance for the pneumoconiosis category

In this study, the multi-scale feature mapping module for lung fibrosis detection demonstrated remarkable performance. In the case of large opacities or no opacity, the proposed method performed better, while categories II and III required not only the detection of pulmonary opacities but also calculation through the discriminant method, in which the corresponding pixel statistics had errors, leading to errors between the CAD and the radiologist. The experimental results indicate that the proposed method can achieve diagnosis at the level of radiologists and can be used to pre-screen pneumoconiosis in clinical applications.

5 Conclusion

In this research, we utilized the open Chest X-ray14 dataset and chest radiographs collected by the Anhui Prevention and Treatment Center for Occupational Disease as research datasets. We proposed a multi-scale feature mapping model based on deep learning feature extraction technology for detecting pulmonary fibrosis and a discrimination method for pneumoconiosis staging by calculating the threshold value of the pixel ratio of the opacity density to lung fields. The experiment demonstrated that the result of CAD was highly consistent with that of clinical experts on real patient which could be applied to the pre-screening and diagnosis of pneumoconiosis in the clinic. Our research relied on a large number of labelled datasets, which directly affected the accuracy of the model. To reduce the image annotation of radiologists, this problem will be solved in future studies. This research mainly focused on the staging of pneumoconiosis, while other lung diseases in DR, such as pneumothorax and pleural effusion, were not involved. Future research should focus on other pulmonary diseases in DR.