Learning a Discriminative Feature Attention Network for pancreas CT segmentation

. Accurate pancreas segmentation is critical for the diagnosis and management of diseases of the pancreas. It is challenging to precisely delineate pancreas due to the highly variations in volume, shape and location. In recent years, coarse-to-ﬁne methods have been widely used to alleviate class imbalance issue and improve pancreas segmentation accuracy. However, cascaded methods could be computationally intensive and the reﬁned results are signiﬁcantly dependent on the performance of its coarse segmentation results. To balance the segmentation accuracy and computational eﬃciency, we propose a Discriminative Feature Attention Network for pancreas segmentation, to eﬀectively highlight pancreas features and improve segmentation accuracy without explicit pancreas location. The ﬁnal segmentation is obtained by applying a simple yet eﬀective post-processing step. Two experiments on both public NIH pancreas CT dataset and abdominal BTCV multi-organ dataset are individually conducted to show the effectiveness of our method for 2D pancreas segmentation. We obtained average Dice Similarity Coeﬃcient (DSC) of 82.82 (cid:6) 6.09%, average Jaccard Index (JI) of 71.13 (cid:6) 8.30% and average Symmetric Average Surface Distance (ASD) of 1.69 (cid:6) 0.83 mm on the NIH dataset. Compared to the existing deep learning-based pancreas segmentation methods, our experimental results achieve the best average DSC and JI value.


§1 Introduction
Organ segmentation usually refers to the process of extracting specific target organs from medical images. Accurate organ segmentation is a prerequisite for organ measurement, surgical guidance, and radiotherapy effect evaluation in computer-aided diagnosis technology [36]. The pancreas is a soft organ located on the periphery of the abdomen, which lacks a fixed shape and is hidden behind the peritoneum [10]. Pancreas-related diseases are relatively hidden and difficult to detect and cure, especially for pancreatic cancers, which is still accompanied by higher mortality and lower postoperative survival rate [32]. In clinical practice, the pancreas volume is manually delineated by radiologists for the diagnose of pancreas disease and quantitative assessment. For example, the volume of pancreas enables the physicians to estimate endocrine and exocrine pancreatic functions [1]. However, manual annotation is a highly time-consuming and subject to operators. Hence, an accurate and robust automatic segmentation method of pancreas is highly demanded in the clinical management of pancreas diseases, which can allow to alleviate the workload of radiologists and improve the consistency of pancreas segmentation.
It is challenging to accurate segmentation of pancreas in CT images for the following reasons. First, the intensity distribution between the pancreas and its surrounding structural tissues is very close. As shown in Fig 1, the pancreas boundaries are difficult to distinguish even after contrast adjustments. Second, the pancreas is a small and soft abdominal organ with highly irregular shape, leading to severe class imbalance and difficulty in designing a method to adaptively cover all possible pancreas variabilities [10]. Third, it can be seen from Fig 1 that discontinuities exist in some pancreas slices, which is prone to over-segment and under-segment. To address the aforementioned challenges, many pancreas segmentation works have been proposed over the past few years, which can be categorized into two types: top-down and bottom-up methods [11]. In the top-down methods, segmentation is performed by multi-atlas registration and label fusion (MALF) [9,16,26,31]. To reduce the misselection of similar atlas caused by CT intensity, Karasawa et al. proposed a new atlas selection strategy based on vessel structure around the pancreatic tissue for pancreas segmentation [9]. Experimental results show the atlas selection based on vessel structure is much more effective in selecting atlases with similar pancreatic shape and position. However, it is not trivial to select atlases that is general enough to cover all possible pancreas variabilities due to the highly irregular shape and poor contrast with spatially adjacent abdominal tissues.
Recently, Dense prediction based on deep convolutional neural networks have achieved great success in computer vision and medical imaging, such as FCN [15] and Deeporgan [19], which also boost the pancreas segmentation. Since the pancreas often occupies a small proportion of the whole abdomen, most pancreas segmentation methods rely on multi-stage [2,12], cascaded CNNs [6,20,21,38], in order to improve the segmentation accuracy. Roth et al. firstly proposed a bottom-up, coarse-to-fine approach for pancreas CT segmentation, utilizing multi-level deep ConvNet model to learn robust pancreas features representation and effectively prune the coarse pancreas over-segmentation [19]. This framework is further improved by the holistically-nested segmentation networks [20,21]. Zhou et al. proposed a 2D fixed-point models based on FCN-8s [38], in which coarse segmentation provides pancreas location for further fine-scaled models iteratively. Asaturyan et al. presented an approach for automatic pancreas segmentation based on a hierarchical pooling of information by classifying extracted image patches, superpixels and intensity distributions as pancreatic tissue or otherwise [2]. Li et al. proposed a new CAD model for pancreas cancer on PET/CT images based on a gray interval mapping (GIP) method and dual threshold principal component analysis [12]. Although, the multi-stage methods have demonstrated significant improvements over the traditional methods, it is complex to train and lack of generalization due to the presence of multiple learning stages [25].
Attention-based image classification [27] and semantic segmentation architectures [34] have recently witnessed increased focus. Attention mechanisms aim at emphasizing important information and filtering irrelevant information. Hu et al. proposed a compact module to explicitly explore the relationship between channels. In their squeeze-and-excitation module, they performed global average pooling to obtain channel-wise feature response vector [7]. Liu et al. proposed an adaptively spatial feature fusion (ASFF) [14], utilizing spatial attention to optimize the feature fusion process. Wang et al. presented non-local operations [28] to capture long-range dependencies, which perform well in modelling contextual information. Woo et al. proposed two simple and effective attention modules based on channel-wise and spatial-wise attention, named Convolutional Block Attention Module (CBAM) [30] and Bottleneck Attention Module (BAM) [17], which can learn to selectively focus on the salient features in channel and spatial dimensions, and then recalibrate the intermediate features expression effectively. Oktay et al. proposed a 3D Attention U-Net architecture for abdominal organs segmentation, by integrating additive gated attention module in the skip connections of the decoder part of U-Net, which could implicitly learn to focus on more discriminant regions of the image and suppress irrelevant information [24]. While 3D deep networks [22,23,33] can directly leverage the inherent spatial information between slices, they are more prone to overfit, especially for small datasets. In addition, large computational burden of 3D convolutional filters limit the depth and receptive field of networks, which are two key factors for the improvement of network performance.
Recently, the Discriminative Feature Network (DFN) [35] was proposed to tackle the intraclass inconsistency and inter-class indistinction issues in most semantic segmentation methods. Automatic pancreas segmentation is a semantic segmentation task. To address the challenges of fuzzy boundaries and large shape variations in the pancreas segmentation, we design a Modified Discriminative Feature Attention Network (MDFAN) based on DFN to explore the strengths of attention mechanism for the pancreas segmentation.
In summary, this work has the following contributions: • We design a Discriminative Feature Attention Network to simultaneously address the intra-class inconsistency and inter-class indistinction issues of the pancreas segmentation. Quantitative evaluation on two publicly available datasets validates the effectiveness of the proposed method.
• We apply attention mechanism in our network, which can enhance the discriminative information of the pancreas structures by concentrating attention close to the pancreas, which also contributes to remove the explicit pancreas location module or network.
• We propose a lightweight Improved Refinement Residual Block (IRRB), which can model the importance of the spatial positions within each feature map and aggregate contextual information over local features.
• We propose a simple but effective post-processing method to refine the segmentation results of the proposed network.
To the best of our knowledge, this is the first attempt to segment pancreas under the guidance of attention mechanism in a 2D single-step training network with a simple postprocessing.

§2 Materials and Methods
In this section, we propose a Discriminative Feature Attention Network for the pancreas segmentation. Unlike cascaded methods-pancreas localization and pancreas segmentation, the proposed network aims to utilize the attention mechanism to adaptively locate the pancreas and improve the performance and efficiency of pancreas segmentation. Our proposed method is based on the DFN proposed in [35], we utilized the modified DFN as our baseline by replacing the pretrained residual block in the backbone ResNet-101 with the pretrained dense block in the DenseNet-121, aiming at enhancing feature propagation and encouraging feature reuse. We call the modified DFN as MDFN. Fig 2 shows that the proposed network has three components: one shared attention-based feature extraction branch, Smooth sub-network and Border sub-network. To improve the capabilities of feature extraction, the four denseblocks and three transitions (denseblock1 ∼ dense-block4, transtion1 ∼ transition3) from the pre-trained DenseNet121 network [5], along with BAM [17] are utilized to enhance the learning of features and obtain discriminative hierarchical features by exploiting spatial-wise and channel-wise independence. BAM is designed to explicitly learn spatial (where) and channel-wise (what) attention separately. As shown in Fig 3, BAM composes of spatial attention branch and channel attention branch. For the given input feature map F ∈ R C×H×W , BAM infers a spatial-and-channel attention map M(F) ∈ R C×H×W . The refined feature map F ′ is computed as:

Network architecture
where ⊗ denotes element-wise multiplication. It can be observed that the BAM is placed at each bottleneck of the proposed model to highlight features from different layers and select relevant and useful features. . CAB aims to enhance semantic consistency of the pancreas, it infuses high-level semantic information to low-level feature maps by learning the global semantic information relationship on different channel images, and generate discriminative feature representations (as shown in Fig 4). The goal of the Smooth sub-network is to exploit the high-level features with strong consistency to guide the low-level features prediction for intra-class consistency and retain boundary information. Specifically, the channel attention block and the proposed improved refinement residual block are utilized to recalibrate the feature maps separately along channel and space according to the response of feature maps. The combination of CAB and IRRB adaptively reassign large weights to high activation regions and useful channels to enhance the intra-class consistency. However, it is still not trivial to delineate pancreas boundary due to the fuzzy boundaries of pancreas. To differ-entiate the features beside pancreas boundary, we employed a bottom-up Border sub-network [35], which utilized the pancreas semantic boundary of the existing target labels to supervise and recognize the shape of pancreas. Specifically, the feature maps obtained from lower stages contain spatial details information, while those generated by higher stages with larger reception fields contain more semantic context cues. The proposed IRRB can select more discriminative spatial features to gradually help the border sub-network restore boundaries and enlarge the edge discrimination, thus reduce the impact of inter-class indistinction.

Improved Refinement Residual Block
Spatial attention mechanisms is widely used in classification and semantic segmentation [27], [34]. The goal of spatial attention is to assign large weight to target-related locations and aggregate contextual information within each feature map. Yu et al. [35] proposed a Refinement Residual Block (RRB), which could enhance the recognition ability of each stage and refine the feature maps, as shown in  However, we observed that the original smooth sub-network and border sub-network in [35] did not consider the spatial correlation within feature maps, which enlighten me to introduce spatial attention to the Refinement Residual Block, termed Improved Refinement Residual Block (IRRB). Fig 6 illustrates the architecture of the proposed IRRB. The IRRB consists of continuous convolution, batch normalization and ReLu layers. To exploit spatial-wise interdependencies, we first utilized two 1 × 1 convolution layers to gradually reduce the channels of input feature maps to 1 before sigmoid operation. Then, one 3 × 3 convolution, followed by BN and ReLu, as well as another 3 × 3 convolution are utilized to increase the receptive field and improve the awareness of contextual information within feature maps, which is helpful for the highly-varied pancreas size and position. To avoid information loss after spatial attention and speed up convergence, the residual connection is employed. In short, the output feature map of IRRB can be formulated as: where σ denotes a sigmoid function, f 1×1 and g 1×1 are two convolution operation with the filter size of 1 × 1, H is an operation, consisting of two 3×3 convolution, BN and ReLu, F is an intermediate feature map. The IRRB learns a self-attention mask to enhance the targeted regions within feature maps, and then helps the network to emphasize the regions, which are more relevant to the semantic classes. For the smooth sub-network, the IRRB can attend to relevant spatial locations in the feature maps of low-level layers and gradually recover the spatial details in a top-down manner. For the bottom-up border sub-network, we gradually fusion spatial detailed features from low-to high-levels by explicitly modeling spatial-wise attention at various levels, which strengthens the semantic discrimination of high-level features with details, thus boost edge classification. Fig 7 qualitatively demonstrates that our proposed IRRB can effectively capture more detailed pancreatic features information during the decoding stage. Figure 6. The structure of Improved Refinement Residual Block.

Loss function
We employed a hybrid loss based on the Dice loss and Focal loss [13] for pancreas segmentation. The aim of Dice loss is to learn the imbalanced class distribution of Smooth sub-network, which is defined as: Because pancreas boundaries occupy a very small region of the whole CT scan and pixels on the boundary are easy to misclassify. We adopt the Focal loss, a dynamically scaled cross entropy loss, which can adaptively reduce the contribution of easy examples during training and focus the Border sub-network on hard examples, it is defined as: In all experiments, we use the Dice loss in conjunction with the Focal loss: where g k ∈ {0, 1} and p k ∈ [0, 1] denote the manual annotations and automatic segmentations, respectively. N denotes the total number of pixels in an image and ϵ provides numerical stability to prevent division by zero. In our experiments, we trained all models with λ = 0.025 to balance the boundary Focal loss and the regional Dice loss and set γ = 2.0.

Post-processing
Many prior studies [8,18,37] have demonstrated that post-processing is an efficient way to improve the segmentation performance by refining the results of CNNs. Conditional random field (CRF) algorithm is widely used as a post-processing step in [8,37]. In this work, we present a simple yet effective post-processing method to refine the predictions of the proposed network. Our post-processing is based on connected component operation. Table 1 shows that the MDFN II can produce relatively good pancreas predictions. However, it is difficult to avoid over-segmentation due to the low contrast between pancreas and the complex surrounding tissues. Moreover, the pancreas only occupies a small part of the whole abdomen and has irregular shape, which further increases the possibility of false segmentation. In order to separate the over-segmented regions, which are weakly connected with pancreas, connected component algorithm is utilized to to keep the largest connected component and reduce the false positives in the predictions. Specifically, the pancreas segmentations from the MDFN II were post-processed by eliminating connected component comprising <20% of the total label volume. As shown in Table 1, the proposed post-processing significantly improves the average DSC and ASD of the pancreas. Here, we termed MDFAN II with post-processing as MDFAN III. The average inference time for post-processing per volume is 1.68 seconds.

Data pre-processing
To quantitatively evaluate the effectiveness and generalization of the proposed model, two different abdominal CT datasets are used: (1) A public pancreas dataset, which contains 82 contrast-enhanced abdominal CT volumes, is acquired at the National Institutes of Health Clinical Center from pre-nephrectomy healthy kidney donors or patients with neither major abdominal pathologies nor pancreatic cancer lesions [16]. The resolution of each CT volume is 512 × 512 × L, where L ∈ [181, 466] is the number of sampling slices along the long axis of the body. The slice thickness varies from 0.5 mm to 1.0 mm.
(2) The 'Beyond the Cranial Vault' (BTCV) segmentation challenge dataset (https://www.synapse.org/#!Synapse:syn3193805/wiki/89480) consists of 30 training data, which have annotations of all abdominal organs except duodenum, and 20 unseen testing data. The in-plane resolution of BTCV dataset varies from 0.54 mm to 0.98 mm and the slice thickness ranges from 2.5 mm to 5.0 mm. 17 patients from the 20 unseen testing data have manual annotations of eight abdominal organs, which is provided by Gibson et al. [4]. To quantitatively assess the generalization of the proposed model, we utilized 30 training data to train our proposed model, and then test the segmentation performance on the 17 testing data with annotations.
The image intensity values in a CT slice of both datasets were clipped to [−100, 240] HU to filter out irrelevant information, and further normalized with zero mean and unit variance. It is important to note only axial slices are used to train our models.

Evaluation metrics
Five metrics including the Dice Similarity Coefficient (DSC), Jaccard index (JI), Precision, Recall and Symmetric Average Surface Distance (ASD) are used to quantitatively evaluate the segmentation performance of different methods.
• Dice Similarity Coefficient (DSC) and Jaccard index (JI) measure the volumetric overlap degree between manually labeled ground truths and network predictions. They are defined as [3]: • Precision measures the proportion of truly positive voxels in the predictions. It is defined as: • Recall measures the proportion of truly positive voxels in the manually labeled ground truths. It is defined as: • Average Surface Distance (ASD) measures the average distance between the surface of manual and automatic segmentations [29]. It is defined as: where V gt , V seg represent the voxel sets of manual annotations and automatic segmentations, respectively, S gt and S seg are the corresponding surface voxel sets of V gt and V seg . d(z, S gt ) denotes the minimum Euclidean distance of voxel z ∈ S seg to all voxels in S gt . For DSC, JI and ASD metrics, the experimental results are all reported as the mean with standard deviation over all testing samples. For precision and recall metrics, we reported the mean score over all testing samples.

Implementation
We implement our method based on the PyTorch platform. An Adam optimizer with initial learning rate of 0.0001 is used to train all models. For the NIH dataset, we trained our proposed method for 16 epochs under the standard 4-fold cross-validation. For the 47 patients from BTCV segmentation dataset, we utilized 30 training data to train the proposed method for 50 epochs, and tested the model performance on the remaining 17 testing data. For both datasets, the batch size is set to 4 and the learning rate is reduced by a factor of 10 every 10 epochs. All models are trained with a NVIDIA Tesla P40 GPU of 24G memory for acceleration. During training, each input image is randomly rotated (r ∈ [−45 • , 45 • ]) and scaled (s ∈ [0.9, 1.1]) (with probability 0.5) in order to improve the generalization performance on the validation-set. The reason why we set 50 epochs for the BTCV subset is that the number of training data is smaller and the images resolution is lower, which requires more epochs to converge. §3 Experimental results To evaluate the proposed method, we conducted two experiments on the NIH dataset [20] and the 'Beyond the Cranial Vault' (BTCV) segmentation challenge dataset. Experimental results demonstrate that the proposed method shows consistent performance on the two datasets.

Segmentation results on the NIH dataset
To assess the effectiveness of the Bottleneck Attention Block (BAM) and the proposed Improved Refinement Residual Block (IRRB) in our method, we compared three models-MDFN, MDFAN I, and MDFAN II. For fair comparisons, we kept model structure and settings unchanged with only blocks being replaced or added. Fig 7 qualitatively shows the improvements brought by the Bottleneck Attention Block (BAM) and the proposed Improved Refinement Residual Block (IRRB). It is easy to note that our MDFAN II does a better job in the pancreas localization and classification. Specifically, the comparison between third column and fourth column in Fig 7 demonstrates the Bottleneck Attention Module (BAM) can force network to pay more attention on the pancreas regions and extract more pancreas information. Similarly, the comparison between the fourth column and the fifth column validates our proposed Improved Refinement Residual Block( IRRB) can encode a wider range of contextual information into local features, which enhances pancreas features recognition capability.
The quantitative comparisons of the Precision, Recall, DSC, JI and ASD of different models are reported in Table 1. The MDFAN II outperforms the MDFN and MDFAN I with improvements of average DSC up to 2.03% and 1.5%. It is worth noting our proposed MDFAN II reports the highest average Recall with 83.54%, which demonstrates the proposed IRRB can effectively filter the features spatially to get accurate saliency maps and aggregate spatial information within feature maps. Although MDFAN II can well recognize pancreas and extract more detailed pancreas information, there inevitably exist over-segmentation. To handle the over-segmentation problem, we utilized a simple connected component detection algorithm as post-processing to refine the pancreas segmentations from MDFAN II. The experimental results in Table 1 show the post-processing greatly improves the mean Precision, DSC, JI and ASD of MDFAN II by 3.37%,1.6%, 2.3% and 1.9 mm, respectively. This is a result of balanced precision and recall scores, which denotes a good quality segmentation. Compared to the baseline MDFN, our final model improves the average DSC by 3.63%. Additionally, our final model takes about 1.864 seconds for each 3D scan, which consists of 0.184 seconds on the end-toend prediction by MDFAN II and 1.68 seconds on post-processing. Fig 8 visualizes the 3D overlap of segmentations from different models with respect to the manually labelled ground truths. Visual inspection shows MDFAN II can capture more pancreas details and enhance the pancreas features response, and MDFAN II with post-processing can effectively prune the over-segmentation regions to increase the average DSC and ASD measurements.

Segmentation results on the BTCV dataset
Since the NIH dataset is a widely used public dataset in previous pancreas segmentation works, to enable fair comparisons with the existing pancreas segmentation methods, the same  4-fold cross-validation was employed for evaluating the performance of the proposed method. However, 4-fold cross-validation may generate relatively ideal results. To further verify the effectiveness and generalization of the proposed model, we conducted another experiment on the 47 patients from the 'Beyond the Cranial Vault' (BTCV) segmentation challenge. As shown in Table 2, compared with the baseline MDFN, the MDFN II improve the segmentation accuracy by 1.59%, 1.96% and 2.31 mm in terms of average DSC, JI and ASD, which demonstrates the attention mechanism can enhance the feature representations and improve segmentation accuracy. Furthermore, we adopted the proposed post-processing to refine the segmentation results from MDFAN II, in contrast to the baseline MDFN, the results significantly improved to 79.34% and 1.15 mm in terms of average DSC and average ASD, yielding increase of 5.87% and 5.22 mm respectively, which demonstrates the proposed post-processing can effectively prune the false positive regions and then achieve more robust performance. Above all, although the BTCV challenge dataset is smaller and has much lower image resolution than the NIH dataset, we still achieves comparable performance. Specifically, the experimental results on the BTCV dataset outperform the multi-stage models [2,12] and cascaded models [19,20,21], which utilized the explicit location modules or networks. In addition, our pancreas segmentation results on the BTCV challenge dataset achieve rank three in the Abdomen Leaderboard (https: //www.synapse.org/#!Synapse:syn3193805/wiki/217785), which further demonstrates the effectiveness of the proposed method. Figure 8. Examples of 3D fusion maps between predictions from different models and the ground truths, showing MDFAN II can capture more detailed information of pancreas, and post-processing helps to prune the over-segmentation regions. The red denotes the ground truths, the blue denotes network predictions.

Comparison with other state-of-the-art methods
We compared the final MDFAN III (i.e. MDFAN II with post-processing) with seven stateof-the-art pancreas segmentation methods [2,12,19,20,21,24,38]. To ensure fair comparisons, all methods were implemented on the NIH dataset. Note that the experimental results of other seven methods were obtained directly from their corresponding literatures. As shown in Table  3, our method achieves the average DSC of 82.82% and average JI of 71.13%, which outperforms all comparison methods. Despite the segmentation performance, the proposed method is more efficient than most comparison methods. Specifically, [2,12,19,20,21,38] are multi-stage, cascaded methods, which perform pancreas localization and pixel-wise classification separately, leading to low computation efficiency and generalizability. In addition, different from our simple post-processing, [20,21] both rely on post-processing with random forest to further refine CNN's outputs. Overall, the experimental results show that our method has advantages over the coarse-to-fine methods [19,20,21,38], multi-level method [2,12]. In particular, compared to 3D method [24], our proposed method achieved slightly better segmentation performance in average DSC, which is a good proof of the effectiveness of our proposed MDFAN III. The pancreas is an important digestive organ in the abdomen, which plays a significant role in the decomposition and absorption of blood sugar and nutrients. Accurate pancreas segmentation can provide useful information for clinicians. To address the inefficiency of coarseto-fine methods and unclear boundaries in the pancreas segmentation, we introduce attention mechanism to realize implicit localization for the pancreas, and propose a composite loss to force network pay more attention on boundary pixels. To the best of our knowledge, the proposed algorithm outperformed all 2D pancreas segmentation approaches on the NIH dataset under 4-fold cross-validation without the help of explicit pancreas localization, which demonstrates channel-wise and spatial attention can implicitly localize and highlight the pancreas regions, and thus enhance the representation of pancreas features. What's more, Table 3 shows the proposed algorithm outperformed the 3D attention model [24] in term of average DSC, which indicates the attention mechanism can automatically aggregate the contextual information over local features, and then utilize spatial context to capture pancreas features, and thus improve the performance of network. Overall, the proposed algorithm not only keeps a high segmentation accuracy on the pancreas, but also improve the efficiency of pancreas segmentation.
In order to gain a better understanding of the Bottleneck Attention Block (BAM) and the proposed Improved Refinement Residual Block (IRRB), we conducted the same post-processing on the baseline MDFN, the experimental results are reported in Table 4. As shown in Table  Table 4. Quantitative comparison of the post-process on the baseline MDFN and the proposed MDFAN II based on DSC, JI (%), average ASD (mm) and run time (s) (The best results are marked in bold). 4, under the same post-processing, the MDFAN II improves the average DSC, JI and ASD by 2.46%, 3.26% and 0.28 mm over the baseline MDFN, which demonstrates the combination of the Bottleneck Attention Block (BAM) and the proposed Improved Refinement Residual Block (IRRB) can effectively improve the segmentation accuracy. In addition, to validate the effectiveness of focal loss used to guide Border sub-network training, we conducted another comparison experiment between the focal loss and cross entropy loss under the same architecture MDFAN II, as well as the regional Dice loss. As shown in Table 5, the MDFAN II with focal loss improve the overall performance in terms of the average DSC, JI and ASD, especially for ASD, which demonstrates the modulation factor in focal loss [13] can force network focus on hard samples, such as boundary pixels, to better delineate pancreas boundary.

Method
There are several limitations in this study. First, over-segmentation exists in the predictions from the MDFAN II, this is mainly because attention mechanism may suffer from semantic confusion due to the highly similarity in intensity between target pancreas and surrounding organs and tissues. Next, we will consider how to design more discriminative attention modules to effectively locate the pancreas and reduce the interference of background. Second, as shown in Table 2, there still have space to improve the generalization of the proposed algorithm on different dataset, such as the BTCV dataset. Since the number of training set in the BTCV dataset is small and the resolution of images is low, the model with large numbers of parameters is prone to overfit, and then degrade network performance, which pushes us to consider an adaptive regularization technique in our future works. §5 Conclusion Accurate delineation of pancreas can assist doctors in the diagnosis of pancreas diseases. In this paper, we propose a single-stage Discriminative Feature Attention Network for the pancreas segmentation. Our method has two advantages: 1) we integrate channel-wise and spatial-wise attention into the baseline MDFN to enhance feature extraction and eliminate the necessity of using explicit pancreas localization modules. 2) we adopt a simple yet effective post-processing to refine the segmentation results. The experimental results show our network can effectively handle the issues of intra-class inconsistency and inter-class indistinction in the pancreas segmentation. Because the proposed method is a single-step end-to-end training framework with simple post-processing, it is simple to implement. Above all, the proposed method achieves consistently experimental results on the two pancreas datasets, which demonstrates the effectiveness and generalization of our proposed method.