1 Introduction

The latest scientific research shows that more than half of cancer survivors survive the first 15 to 20 years [1]. Early detection and diagnosis are important starting points for cancer treatment, which can provide real help for subsequent cancer grade and type [2], cancer metastasis detection [3], plan of cure and personalized medicine. Among all diagnostic methods, pathological diagnosis is the gold standard of disease diagnosis. A high-quality digital pathological image contains thousands of different types of nuclei, and their distribution and appearance characteristics provide a great deal of helpful information for clinical diagnosis. For example, in breast cancer histologic imaging, the spatial distribution of nuclei or some increase in nuclear area strongly correlates with immune response, particular genes, and aggressiveness grade for therapeutics or survival prediction [4]. The pathological slice is derived from the pathological image, and pathologists with different experiences examine the traditional method. However, the presence of many cells attributed to different tissues in the pathologist makes reading the images repetitive and subjective. In order to reduce the repetitive work of clinicians and avoid the diagnostic differences between observers, digital pathological images and recognition can solve this problem.

With the recent rise of deep learning based on convolutional neural networks, nuclear segmentation, and classification accuracy have dramatically improved compared to traditional digital image processing methods. However, the segmentation results obtained by neural network learning also have disadvantages. First, thanks to considerable staining differences on the pathological images, some nuclei are similar in color to the cytoplasmic background, resulting in blurred boundaries. Secondly, because there are many touching or overlapping nuclei in the tissues, the neural network mistakenly aggregates multiple nuclei into a single nucleus in the training process, resulting in over-segmentation. Finally, the density of malignant cells in tumor tissues is usually much higher than in normal tissues, requiring analytical methods with good generalization ability to deal with mutually closed and overlapping nuclei at such high cell numbers and nuclear pleomorphism.

In this paper, we design a dual-path strategy to accomplish the task of nuclear instance segmentation. First, the enhanced nuclear contour information is added to the neural network. Then, adding the attention mechanism and fusion modules to the nuclei features of different information was effectively extracted. Our contributions can be summarized as follows:

  • We extracted the contour information of nuclei with morphological techniques and performed contour enhancement. The prior knowledge of the contours is learned as independent feature maps.

  • We propose a dual-path nuclear segmentation network that trains nuclei and contours separately and performs feature fusion in decoding. The spatial location and contour information of the nuclei is fully considered.

  • Attention module to be added after each skip connection. Fusion of feature maps of the nuclei in the encoding and decoding stages.

  • We designed a fusion module to fuse attention weights, nuclei features, and contour information simply and efficiently. The saliency information of the nuclei and the accurate spatial location are fused with the features.

2 Related work

2.1 Baseline

It uses dual-channel spatial attention, space, scale, non-local attention, and mixed input from brain slices of different dimensions, including 2D and 3D. The symmetric U-Net structure is broken to construct a new decoding stage [5]. The scale-aware feature module is redesigned by an idea similar to the attention mechanism, which locates the nearby boundary region to enhance the information of the nuclear region [6]. Feature fusion is used in a similar chain way. The attention mechanism of space and channel is fused for the two feature graphs, and the weight value is set as one so that the two feature maps are mutually exclusive [7]. The innovative part of this paper is to capture the features of multiple channels by LSTM units, using feature enhancement blocks to connect the downsampling of the original image to prevent the occurrence of position message loss [8]. DIST is proposed in this paper and introduces polygon into nuclear instance segmentation. The distance between the centroid and the contour is predicted, and a neural network learns the distance to complete a nuclei instance segmentation [9]. This paper uses the nucleus contours as auxiliary information to help in nucleus segmentation. The final result is obtained by subtracting the contour from the nucleus and then doing the expansion operation [10]. Firstly, the dataset is reconstructed, and the annotation information is made at the edge of the nuclei. In the traditional symmetric structure, the transform and conv module is used to extract information through three branch tasks, including mask, normal edge, and clustered edge. They share weights to accomplish efficient instance segmentation [11].

Fig. 1
figure 1

Overview of network. extract the nuclei contour information, then fuse the attention, contour and convolutional feature maps trained by the network, and finally get the nuclei segmentation result

2.2 Multi-task

The structure of multipath can accomplish multiple tasks but also can integrate semantic information of different paths to improve the performance of the main task. The core idea is to improve the approach of the main task by constructing an auxiliary nuclear boundary task and to add an attention module that can enhance the context information for nuclear segmentation [12]. HoVer-Net is a classical paper on nuclear instance segmentation. It is a multi-branch task that simultaneously finishes nuclei segmentation and classification methods. It innovatively transforms the segmentation problem into the prediction problem of distance graphs. But it can be designed more complex and effective in the decoding stage [13]. In this paper, we determine the center of each kernel by predicting the key points of the kernel and then use instance segmentation to obtain the category corresponding to the kernel centroid. The designed network can perform detection, segmentation, and classification tasks simultaneously [14]. In this paper, a multi-task branching structure is also used for instance segmentation of different channels of the pathological, including RGB hematoxylin. A new feature aggregation strategy is proposed to fuse features from different branches [15]. In the multi-modal semantic segmentation task, multiple single-mode networks are used to learn the characteristics of different sub-components, and then the newly designed dual recalibration module is used for information exchange, respectively in terms of channel and space fusion to improve performance [16]. In this paper, a dual task of classification and segmentation was designed to separate covid-19 infection and conventional pneumonia through key cases for experiments. Infection rate was defined before 3D segmentation. In the decoding stage, the classification branch adopts the vit structure, and the segmentation branch adopts the module of similar attention mechanism, which provides the idea for our two-way structure [17]. A decomposition-recombination strategy for De-overlapping Networks. The dual path region segmentation module explicitly breaks down the cell clusters into intersecting and complementary regions, which are then integrated using the semantic consistency guided recombination module [18].

3 Proposed method

As shown in Fig. 1, we design a dual-path network to learn nucleus and cell contour information independently. First, we performed the original pathological map’s extraction edge and swelling edge operation and used the newly generated contour image as a new dataset. Then, we extract the features learned by the contour, attention, and convolutional neural networks separately, and the heat map can represent the features learned by the network. Fusion of the three maps to get the best cell nucleus features. The pixels belonging to the cell nucleus are reflected in the image by activation function and setting the threshold value. Finally, the watershed algorithm represents the cell nuclei in different colors.

Fig. 2
figure 2

Network architecture. Green is the contour path, orange is the encoding stage of the nuclear path, and blue is the nuclear decoding stage. The fusion module incorporates attention, contour and kernel features, and the decoding phase uses four upsampling and fusion modules

We detailed introduce the proposed network for multi-organ pathology segmentation. An overview of the proposed network is Fig. 2. The network has two encoding and decoding stages and adopts the classic U-shaped structure. In the coding process, nuclei information and contour information are taken as independent data for feature extraction. Unlike classification tasks, segmentation requires combining higher-resolution local information in more images and lower-resolution global features to distinguish foreground and background. We used the attention module in the decoding phase and redesigned the feature fusion module. The feature fusion module can efficiently fuse the attention weight map, the upsampled feature map, and the contour feature map to generate new ones that better represent the entire nuclei region. This method fully considers the characteristics of the nuclear contour information, which can constrain the nucleus’s spatial position and apply the contour information to the network for sufficient training.

3.1 Nuclei contour

We preprocessed the nuclei for morphological aspects. We extracted the labels corresponding to the pathological images and used the canny algorithm for cell nucleus contour extraction. The image gradient is calculated using a more accurate L2 parametric number and then traversing the pixel points in the image to remove all non-edge topics. Two thresholds are set to determine the properties of the edge based on the gradient of the edge pixels, and the high threshold is set to 255 to ensure that all independently hashed cell nuclei contours are marked as solid edges. The low threshold is set to 80 to ensure that the contours of dense regions connected with strong edges can be marked. This way, we get the initial picture of the cell nucleus contour. Since the extracted cell nuclei contours represent a small proportion of the whole image, we use the expansion process in morphology to enhance the contour information. We increase the number of pixels representing the contours in each window by sliding the window. We set the window size to 5. The expanded contour image is more efficient and provides more contour features in dense areas.

3.2 Dual-path

We were inspired by multi-tasking and multi-mode. Multi-task branches can be influenced by each other. Multi-modal tasks can use the characteristics of different sub-components for information interaction and fusion. Therefore, we can follow the idea of a multi-modal task in the nuclear instance segmentation. We can use the local information of nuclear contour and the global information of the overall nuclear region to integrate different semantic details that completes the high-quality verification example segmentation task. We independently extracted nuclear and contour information for feature extraction and used two 3\(\times\)3 convolution, BN and Relu activation, as the basic modules for feature extraction. In the process of downsampling, four times maximum pooling was carried out, the image size was halved each time, and the number of channels in the network was 64, 128, 256, 512, and 1024. The two independent networks use the same coding phase to extract the semantic information of the subcomponents independently. In the decoding stage, Feature maps are upsampled fourth. Bilinear interpolation is adopted to save resources and get better results than deconvolution. The result of the first up-sampling is the input of the attention module, and the result of the second up-sampling is the input of the fusion module. The attention weight graph, contour information feature graph, and bilinear interpolation up-sampled feature graph are input into the fusion module, and the new feature graph is obtained as the next stage’s input. The decoding process consists of four up-sampling processes, and the contour feature maps of different scales are fused into the verification example segmentation network to get the final result. The watershed algorithm is set to obtain the segmentation index of the instance and visualize the result.

3.3 Attention

The attention module in Fig. 3 can weight the feature maps of the same scale from different stages to highlight the foreground region. We provide a single network architecture attention mechanism that aims to reduce the noise in the non-foreground of image, giving the weight values of the foreground parts.

Fig. 3
figure 3

The structure of attention module

The specific steps are as follows: the two feature maps are concatenated, and Relu activation is performed after the concatenation. The weight matrix is obtained using 1\(\times\)1 convolution operation as the number of channels and Sigmoid activation. The similarity coefficient a \(\in [0, 1\)] which can be expressed as:

$$\begin{aligned} {a} = \sigma (\psi (R(x_1+x_2))) \end{aligned}$$
(1)

Finally, the feature map is dotted with the weight matrix to obtain the new feature map.

$$\begin{aligned} {x} = a \cdot x_2 \end{aligned}$$
(2)

\(x_1\) and \(x_2\) represent two feature maps. R represents the Relu activation function. \(\psi\) represent 1 \(\times\) 1 convolution. \(\sigma\) represents the Sigmoid activation function. x represent a new feature map.

3.4 Fusion module

The advantage of the fusion of opposing events is to enhance important information while suppressing unimportant information. It allowed the fusion of important information features through simple and efficient experiments. As shown in Fig. 4, our step is to fuse the feature maps of different semantic information. The attentional feature map represents the semantic information in both the encoding and decoding stages. The contour feature map represents the semantic information learned by the contour network. The semantic mask represents the nuclei features learned by the convolutional neural network. The latter two are the important feature maps that we want to fuse.

$$\begin{aligned} {F} = F_S + a \cdot F_E + (1-a) \cdot F_A \end{aligned}$$
(3)
Fig. 4
figure 4

The structure of fusion module

\(F_S\) representing the output of the convolutional neural network. \(F_E\) represents the output of the contour network. \(F_E\) represents the output of the attention module. The weight values are set to sum to 1, and no dynamic assignment of weights is used. The global feature weights of the cell nuclei are reduced when the results of the experiments indicate that the nuclear contours should occupy a significant part of the feature map. The first advantage of this operation is that the outline information of the nucleus can be used as an important feature map for fusion. It is more direct and accurate than modules that focus on context information in other models or modules with different receptive fields. The second advantage is to inhibit the over-segmented or under-segmented parts of the nucleus from preventing the wrong segmentation of the details without the nucleus or the under-segmented areas. It is crucial for pathological diagnosis. The purpose of doing so is simple and efficient and avoids the instability of the evaluation index caused by emotional weight. Ablation experiments are easily performed to determine how contour information affects the overall metrics. We fused the features after each upsampling. It aims to fuse the semantic information at different scales well and improve segmentation accuracy.

4 Experiments

4.1 Datasets and evaluation metrics

Kumar is a common nuclei segmentation dataset [19]. It has 16 training images and 14 test images. The test set has eight images of the same organs as the training set and six images of different organs. Seven organs are covered(breast, liver, kidney, bladder, colon, stomach, prostate). It contains 21,000 nuclei of various shapes and is annotated by specialized physicians. Each of which is 1000\(\times\)1000 pixels, scanned at 40X magnification.

CPM has two datasets, containing CPM-15 and CPM-17, both from TCGA [20]. CPM-15 has 15 images containing two types of cancer. CPM-17 has 32 training images and 32 test images, and this dataset contains four types of brain cancer, including non-small cell lung cancer (NSCLC), head and neck squamous cell carcinoma (HNSCC), glioblastoma multiforme (GBM), and lower grade glioma (LGG) tumors. Cpm15 contains 2905 cell nuclei. Each is 400\(\times\)400 pixels and 1000\(\times\)600, scanned at 40X and 20X magnification. Cpm17 contains 7570 cell nuclei. Each of which is 500\(\times\)500 pixels and 600\(\times\)600, scanned at 40X and 20X magnification. In order to ensure the universality of the test dataset, we collated the two datasets together to better prove the generalization of the model.

TNBC has a total of 50 images from the Curie Institute [21]. All slides are taken from a Triple Negative Breast Cancer(TNBC) patient cohort. These slides were selected from breast cancer tissue with heterogeneous. This data set represents both intra- and inter-patient variability for the same cancer type. TNBC contains 4022 cell nuclei. Each of which is 512\(\times\)512 pixels, scanned at 40X magnification. The CPM and Kumar datasets contain pathological sections of different organ sites in the human body. Their combination can cover different forms of the nucleus.

In the experiments, we mainly use three metrics to evaluate the nuclei instance segmentation results of each model, including Dice score, Aggregated Jaccard Index (AJI), and Panoptic Quality (PQ).

DICE score is defined as follows:

$$\begin{aligned} {DICE} = \frac{2 \times |G \bigcap P|}{|G|+|P|} \end{aligned}$$
(4)

G and P denote the ground truth and the prediction results, respectively.

AJI is defined as follows:

$$\begin{aligned} {AJI} = \frac{\sum _{i=1}^{n} G_i \bigcap P_j }{\sum _{i=1}^{n} G_i \bigcup P_j + \sum _{k\in N} P_k} \end{aligned}$$
(5)

where j = \(argmax_k\frac{G_i \bigcap P_j}{G_i \bigcup P_j}\), \(P = \left\{ P_1,P_2,..., P_m\right\}\) and \(G = \left\{ G_1,G_2,..., G_n\right\}\) denote the prediction results and the ground truth respectively. N is the set of indices of prediction results without any corresponding ground truth.

Since the AJI score may over-penalize the overlapping region. To avoid this problem, Panoptic Quality (PQ) is introduced to evaluate the performance of nuclei segmentation at the instance level, which is defined as follows:

$$\begin{aligned} {PQ} = \frac{|TP|}{|TP|+ \frac{1}{2}|FP|+\frac{1}{2}|FN|} \times \frac{\sum _{(x,y)\in TP} IoU(x,y)}{|TP|} \end{aligned}$$
(6)

In the instance level, x and y denote the ground truth and the prediction segment. IoU represents the intersection of the union. When the IoU > 0.5 of each (x, y) pair, the result can be regarded to be unique. TP, FP, and FN denote matched pairs of prediction, unmatched prediction, and unmatched ground truth prediction, respectively. PQ comprises Detection Quality and Segmentation Quality, which are divided into pairs corresponding to Instance-level and Pixel-level metrics; PQ metrics are a unified evaluation of the two metrics.

4.2 Implementation and training details

We implement our network using PyTorch 1.12 on a workstation equipped with NVIDIA GeForce RTX 3060. We crop the images of different resolutions to a uniform size of 256. Then the RGB images, corresponding labels, and contour information are combined into a numpy. We use several data augmentations during training, including normalizing between 0 to 1, flip, center crop, and gamma transform. The same data augmentation scheme was applied across all of the experiments. We set the training batch size to 6 and the epoch number to 100. We applied BCE loss as the loss function and used adaptive moment estimation(Adam) as the optimizer with an initial learning rate of 0.01 and then reduced it to a rate of \(10^-5\) after 80 epochs.

4.3 Quantitative comparisons with existing methods

We conducted valid and sufficient experiments. It is also quantitatively and qualitatively evaluates the effectiveness of our proposed network in nuclear instance segmentation. The quantitative results of the other models are directly derived from their respective papers. All models were trained using the same strategy and code to ensure fairness of comparison. We compare some mainstream semantic segmentation and instance segmentation models, including U-Net, Mask-RCNN, DIST, HoVer-Net, and Triple U-net. The main evaluation metrics we chose were DICE, AJI, and PQ. Dice can verify the segmentation accuracy at the pixel level, and aji can penalize over-, under-, and mis-segmentation by concatenating domains, which is more suitable for use in cell nucleus instance segmentation. However, since AJI excessively punishes overlapping regions, a metric such as pq, which combines instance level and pixel level, is also nuate the model more comprehensively.

Table 1 Quantitative comparisons with existing methods

As shown in Table 1, Mask-RCNN is a generic instance segmentation model. It performed better than the base U-Net, improving metric scores on all three pathological nuclei datasets. DIST uses distance maps for nuclei segmentation and achieves fine results, but nuclei distance maps with regions do not guarantee precise nuclei boundaries. It is susceptible to producing the wrong spatial location of the nucleus when it is localized to the wrong nucleus center position. The evaluation index of DICE has improved, but AJI and PQ are inferior to Mask-RCNN. HoVer-Net proposes using adjacent nuclei gradients as predictors and horizontal and vertical distance maps, which can finish segmentation and classification tasks in the same network. The network shows measurable improvements in all three metrics and achieves the best score in the PQ metric. It is more suitable for legible contour nuclei. Triple U-net utilizes two different types of images, hematoxylin and RGB, and the network side uses three branches to fuse different features from different branches of the task. It achieves quite good results, DICE is the best on both Kumar and CPM-17 metrics, and AJI is slightly improved than HoVer-Net. The results of our proposed network on three different datasets can prove its effectiveness. On the more generalized Kumar dataset, our accuracy differs from HoVer-Net by only 0.002, achieving the best scores on AJI and PQ, which are important metrics of instance segmentation. Its accuracy is due to the spatial constraint brought by the contour information. But at the same time, it will make the internal assignment of the whole nucleus a little underweight, which tends to bring voids inside the nucleus, especially for large independent cell nuclei. In the AJI metric, we achieved an excellent result. Compared with other models, it does not show over-segmentation, under-segmentation, or mis-segmentation, mainly due to the attention mechanism and the constraint of accurate nuclear boundaries, which will solidly locate the cell nucleus’s spatial location. We can also achieve good PQ metrics results while considering semantic and instance segmentation to make the network more generally.

4.4 Ablation experiment

We also conducted ablation experiments to validate each module’s role and assess the model’s validity and the impact of different configurations in Tables 2 and 3. We did a controlled experiment with a single-way network and a dual-way network. When we use the single-way network, its metric scores are similar to the U-Net scores alone. In contrast, the metrics improved in all aspects when adding contour information to the two-way network without any fusion or attention mechanism. AJI also improved by 0.048, and pq similarly improved by 0.092, indicating the importance of contour information for kernel partitioning networks in two-way networks.

Table 2 Statistical comparisons of ablation studies on Kumar dataset

The attention module in the decoder branch enables the model to focus on the nuclear region for enhancement during feature map extraction by roughly localizing and identifying all tissue components around the nucleus in the tissue image. Attention modules implemented at different coding layers focus on different levels of features. The attention module extracts more information about the nucleus and has less background noise. A better classification feature map is provided when the attention module is in place. Feature maps at different scales provide different information, with low-level features focusing more on the inner regions of the nucleus and high-level feature maps focusing more on the boundaries of the nucleus. Therefore, fusing multi-scale feature maps is necessary and effective for this task. So the proposed attention module at the deep level can be combined with the contour feature map to double enhance the atomic nucleus contour. And using attention to contour feature maps at different scales of atomic nucleus feature maps can fuse information of different granularity. The statistical evaluation results show that our proposed attention module is effective after the introduction of the multi-scale fusion module.

We designed this module mainly for simple and efficient fusion. When we use the fusion module, instead of simply concatenating the feature maps together, we give different weight values to improve the stability of the whole network. The deep decoding stage is more concerned with the contour, so the contour feature map is to be given a higher weight value. And the shallow decoding stage is more concerned with the kernel position when the contour feature map is to be given a lower value. This is the way to improve the stability of the network.

Table 3 Statistical comparisons of Weight Value on Kumar dataset
Fig. 5
figure 5

Examples of visual nuclei segmentation results on Kumar, CPM-17 and TNBC. For each dataset, we displayed the 6 models from left to right. The different colours of the nuclear boundaries denote separate instances

4.5 Fusion weight value selection

Overly global fusion of information features does not apply to the nuclei instance segmentation task. Constraints and timely corrections should be applied to the important foreground parts, cascading to complete the final segmentation step by step. We chose to set an as the weight for the fusion weight experiment, and we found that the whole network architecture works best when a is set to 0.7. When the value of a continues to rise to 0.9, the effect of the whole network becomes unstable. It is because the higher percentage of contour weights then makes the weight value of the nucleus position too small, resulting in a two-way network that needs to identify better the spatial position of the entire nucleus to form chaos. The evaluation index can also reflect this when the weight value reaches 1. When a is 0, the contour information does not play a role at all, so it is equivalent to a single-way attention U-Net. When a is set to 1, only the contour information plays a role, which is equivalent to a U-Net network with dual-path contour information. We experimented with three values of weights, 0.3, 0.5, and 0.7, and we found that the effectiveness of the network improved as the contour weights increased, which could also indicate the importance of contour information in the dual-path network.

4.6 Qualitative analysis

To observe the effect of segmentation of pathological cell nuclei, which we visualized on Kumar, CPM-17, and TNBC, we next performed a qualitative analysis. Different colors of nuclear boundaries indicate different instances. In the visualization section, we can see the nuclei of cells of different sizes and shapes, such as inflammatory and connective tissue nuclei, through these images. The overlapping and blurring of contours and boundaries on top of the different patches pose various challenges to segmenting cell nuclei instances with different types of extremely unbalanced distributions, which also test the model’s generalization ability. We mainly observe whether the spatial position of the nucleus is accurate and cannot deviate from the original position. Scatter-independent nuclei in space are to be split out to ensure proper cell shape and spatial location. The network can independently segment dense nuclei regions, which is important for visualizing the results. Segmentation of nuclei with overlapping or adherent cells is the most difficult part of the nuclear segmentation task.

We can see the difference between the models in Fig. 5. It can only finish the basic segmentation task and can only segment the round cell nuclei that are scattered in space well. And it cannot handle adherent nuclei in complex backgrounds or nuclei regions with large differences in shape. MasK-RCNN has improved over U-Net and does better than the original in the dense region of kernels. But the disadvantage is also obvious, and it can’t complete the segmentation task in the region with large shape differences. DIST is improved on irregularly contoured cell tissues due to the use of contour information. However, the spatial position of the nucleus is easily mispredicted due to the error in predicting the center of mass. This is an important reason why AJI is not as good as other models in the evaluation index. HoVer-Net has higher reliability compared to other models, and it has improved segmentation accuracy in nuclear spatial locations and dense regions with partially adherent large cell nuclei. Triple can improve the accuracy in the continuous nuclear dense area, as shown in the Figs. 6, 7 and 8. The shuttle-shaped cell outline can more clearly segment the general outline and give a relatively accurate spatial location.

The experimental results show a significant increase in our dominance presentation when our cellular tissue is non-circular or ellipsoidal in shape. The main reason is that our contour information plays an important role. In shuttle nuclei or irregular nuclei, the segmentation task cannot be done clearly using conventional networks or context-aware modules, and they perceive the pixel points around the shuttle nuclei as the same nucleus. Therefore, an excessive superposition of context-aware modules does not effectively segment out the scattered nuclei. Advantage one is that the small cellular tissue part of the patch, when surrounded by background, is not easily identified and segmented using other networks for such isolated hash points, all of which are ignored. Feature maps using contour information can constrain such small cell nuclei, giving accurate segmentation of such small cell nuclei points. This is important for clinical pathology identification to prevent missing important cancer cells.

Fig. 6
figure 6

The box plots of the network evaluated on Kumar with the metrics of DICE, AJI, and PQ

Fig. 7
figure 7

The box plots of the network evaluated on CPM-17 with the metrics of DICE, AJI, and PQ

Fig. 8
figure 8

The box plots of the network evaluated on TNBC with the metrics of DICE, AJI, and PQ

4.7 Box plot analysis

Box plots have the advantage of visualizing outliers, which tests the stability of the model. When more outliers are in the box plots, the model will make errors in handling scattered nuclei or cells in dense regions. Since all pathology images are sequentially divided into patches of size 256\(\times\)256, a small patch may contain a small number of nuclei or irregular dense tissues. Such patches can lead to distortion and abnormalities in the full evaluation index. Such outlier values were found in both TNBC and CPM-17 datasets. In contrast, the Kumar dataset does not have many outliers in a patch due to the uniform distribution of cell nuclei. We can also observe the volatility degree in the box plots’ data. In the three datasets, it is obvious that DICE is less volatile and PQ is more volatile. In the box plot of Kumar, our model is less volatile and has fewer outliers than the other models, which can also prove that our model has a better stability. The outliers in the box figure come from the fact that some patches only have a small part of nuclei in the picture clipping process, which makes the segmentation accuracy distorted. This situation may occur with outliers that deviate from the box.

4.8 Different test datasets

Table 4 test different dataset

There are only 15 pathology images in the dataset of CPM-15, so we added CPM-15 as the training set to the training set of CPM-17. Because the two datasets contain the same brain organs, most nuclei also show non-overlapping or elliptical shapes. So are 46 images in the training set to expand the training data and give the model adequate training. Next, we conducted experiments using the division of the Kumar dataset from the challenge competition, using the same organ and different organs of Kumar as the test dataset respectively Table 4. We can observe from the table that the test dataset with the same organ is 0.019 higher than the test set with different organs for dice, 0.044 higher for aji, and 0.08 higher for pq, which indicates that the model has better results on the same organ dataset. Finally, we combined the test dataset of CPM-17 with the test set of Kumar to combine different organs from the human body to verify the model’s generalization. Through experiments, we found that the network can adapt to different types and shapes of cell nuclei.

4.9 Heat map analysis

We visualized the heat map using the pathology image in kumar. Three feature maps were extracted, including convolution, contour and fusion. As shown in Fig. 9, it can be observed that the convolutional network learns easily to learn features in non-nuclear regions, and the contour network we designed learns features only for contours and does not learn relevant nuclear components in the non-nuclear areas. Finally, we fuse the feature maps and assign appropriate weights. The results are better than the previous two, mainly in solving the two disadvantages of the spatial inaccuracy of the convolutional network and the absence of the nuclear interior of the contour network.

Fig. 9
figure 9

Visualization of feature activation maps

4.10 Discussion

The existing networks do not separate the attached nuclei well, a common limitation of the current methods. The main problem with adherent cell nuclei is that the pixel point distinction between the target and the surrounding background is tiny, and the network needs to learn these small differences. Complex nuclei shapes, such as spindle-shaped, irregularly shaped nuclei, and such types of organization can also cause difficulties for network learning. Identifying the nuclear boundary network with unclear annotation is difficult, and the network is challenging to partition clearly and smoothly.

5 Conclusion

In this paper, we have proposed a dual-path instance segmentation network based on cell nucleus contours. It performs independent feature extraction of cell nucleus contour information by reconstructing the cell nucleus contour information. We design the attention module and fusion mechanism, further improving cell nucleus features’ accuracy. The attention module enhances saliency mainly for nuclear regions, and the fusion mechanism can fuse contour, convolution, and attention feature maps simply and efficiently. The exact fusion mechanism is performed for all three feature maps in different decoding stages, further improving the network’s capability. In addition, we conducted adequate experiments on three different datasets containing multiple organs of the human body with various morphologies of cell nuclei. The generalizability of our proposed network was demonstrated. Quantitative and qualitative experiments show that our proposed method has achieved state-of-the-art nuclei instance performance.