Automatic liver and tumor segmentation based on deep learning and globally optimized refinement

Automatic segmentation of the liver and hepatic lesions from abdominal 3D computed tomography (CT) images is fundamental tasks in computer-assisted liver surgery planning. However, due to complex backgrounds, ambiguous boundaries, heterogeneous appearances and highly varied shapes of the liver, accurate liver segmentation and tumor detection are still challenging problems. To address these difficulties, we propose an automatic segmentation framework based on 3D U-net with dense connections and globally optimized refinement. Firstly, a deep U-net architecture with dense connections is trained to learn the probability map of the liver. Then the probability map goes into the following refinement step as the initial surface and prior shape. The segmentation of liver tumor is based on the similar network architecture with the help of segmentation results of liver. In order to reduce the influence of the surrounding tissues with the similar intensity and texture behavior with the tumor region, during the training procedure, I × liverlabel is the input of the network for the segmentation of liver tumor. By doing this, the accuracy of segmentation can be improved. The proposed method is fully automatic without any user interaction. Both qualitative and quantitative results reveal that the proposed approach is efficient and accurate for liver volume estimation in clinical application. The high correlation between the automatic and manual references shows that the proposed method can be good enough to replace the time-consuming and non-reproducible manual segmentation method.


§1 Introduction
The prevention and treatment of liver diseases is a major focus of current research for clinical diagnosis [1,2].Liver cancer has been reported as the second most frequent cause of cancer death in men and the sixth leading cause of cancer death in women.Indeed, about 750,000 people were diagnosed with liver cancer and nearly 696,000 people died from this disease worldwide in 2008 [3].Accurate detection and delineation of liver and tumor is a crucial prerequisite for many clinical treatments, such as liver resection, transplantation and radiotherapy treatment planning.In the currently clinical applications, the delineation and detection of liver and tumor are always done manual on each slice by experts.But manual segmentation is subjective, poorly reproducible and time-consuming.Therefore, it is necessary to develop automatic method to accelerate and facilitate diagnosis, therapy planning and monitoring.However, the segmentation of liver and tumor are still challenges due to the following reasons.Firstly, low-contrast between liver and tumor, liver and surrounding regions makes both the liver and tumor boundaries weak/fuzzy and difficult to detect.Secondly, there are different types of contrast levels of tumors (hyper-/hypo-intense tumors), which always lead to complicated intensity distributions and heterogeneous texture appearance.Thirdly, abnormalities in tissues (metastasectomie), size and varying amount of lesions also increase the difficulty of segmentation.
In the past few decades, several state-of-the-art algorithms, including region-based methods, active-contour models, graph cut and machine learning, have been proposed to segment the liver and tumor.Region-based method including region growing [4], region splitting and merging [5], and watershed methods [6].Liu et al. [7] used a gradient vector flow(GVF) [8] based active contour model for the segmentation CT liver images.Massoptier et al. [9] proposed a statistical model-based approach to distinguish hepatic tissue from other abdominal organs, then the statistical based model was incorporated into the GVF model for the segmentation of both liver and tumor.Shaikhli et al. [10] developed a level set method based on the sparse representation of global and local image information for the segmentation of liver from 3D CT volume images.Wang et al. [11] defined a shapeCintensity prior level set model to delineate liver boundaries that incorporates both the probabilistic atlas and probability map constrains.Compared with the previously methods, graph cut-based methods, which is the extension of the classic graph cut proposed by Boykov et al. [12,13] are more popular in liver segmentation.Li et al. [14] proposed a likelihood and local constraint level set model for liver tumor detection.Peng et al. [15] combined intensity, regional appearance, and surface smoothness within a variational framework to deal with fuzzy boundaries and heterogeneous backgrounds.Furthermore, seed constraints, both in the foreground and background, were used in the constrained convex variational model in [16].However, due to the speed and robustness to the noise and heterogeneous in CT images, these methods are not widely applied in clinics.Hence, more methods are still needed to overcome these weaknesses.
Recently, deep learning models, which can learn a hierarchy of features by building high-level features from low level ones, have received researchers attention.Deep learning has been applied to a wide variety of problems and has surpassed the previous state-of-the-art performance, which motivates us to apply this approach to fully automatic liver tumor segmentation in CT. A. Ben-Cohen et al. [17] explored an FCN for the task of liver segmentation and livermetastasis detection in CT examinations.Christ et al. [18] presented a method to automatically segment liver and lesions in CT abdominal images using cascaded fully convolutional neural networks (CFCNs) and dense3D conditional random fields (CRFs).Lu et al. [19] proposed a method (called 3D CNN-GC) that combined 3D fully CNNs and graph cuts to achieve automatic segmentation in CT images.The trained CNN generated a probability map of the liver and then the learned information was integrated into the image data penalty term of graph cuts.
In this paper, we proposed a method to automatically segment liver and tumors in CT abdomen images using 3D U-net with dense connections and graph cut-based globally optimized refinement.The proposed segmentation framework is based on 3D U-net architecture with dense connections and globally optimized refinement.Firstly, a deep U-net architecture with dense connections is trained to learn the probability maps of the liver.Then the probability maps go into the following refinement step as the initial surfaces and prior shapes.The dense connections between layers can encourage feature reuse and reduces the number of parameters while maintaining good performance.By concatenating feature maps from coarse to fine layers, the network allows capturing multi-scale contextual information.Segmentation of liver tumor is based on the similar network architecture with the help of segmentation results of liver.In order to reduce the influence of the surrounding tissues with the similar intensity and texture behavior with the tumor region, during the training procedure, I × liver label is the input of the network for the segmentation of liver tumor.By doing this, the accuracy of segmentation can be improved.A data set of 1161 images with background labeled by experienced radiologists is used for training and 100 images are used to evaluate the algorithm.The proposed algorithm achieves a mean Dice similarity coefficient of 73.6% on test image data set.Experimental results show that the proposed method can be served as an alternatives to replace the time-consuming and non reproducible manual segmentation method.
The rest of the paper is organized as follows.In the next four sections, Section 2 reviews graph cut method and U-net network.And we describe the proposed method in detail.Section 3 illustrates the results and provides a comparative discussion of the proposed algorithm.We conclude the paper in Section 4. §2 Materials and Methods In this section, we present the imaging data used in this study and the proposed automatic liver segmentation framework.In the training stage, deeply dense-connected U-net architecture is trained using labeled CT images.In the testing stage, given a test image, a probability map of the liver is learned by the trained network.Then, the probability map is thresholded to provide both initialization and shape prior for the following refinement segmentation step.In the refinement segmentation step, the liver was segmented based on set of prior information.The energy functional of this step incorporates initial seeds location, the liver probability map, intensity distribution and region appearances.Finally, the energy functional is minimized using a global optimization-based approach to propagate the initial surface to the optimal position.

Materials
In our experiments, totally 1161 volume images from 294 patients with liver tumor were used for training and 100 images for testing.Common CT liver images have four phases, including plain scan, arterial phase, venous phase and portal phase.For the screening reason, the plain scan is the common used.In such way, most of the volume images are the plain scan images.Among all the images, 279 patients have all the four phase images and 15 patients have three phase images.25 patients with 4 phased images, totally 100 volume images are used as testing data set.All the images are from the First Affiliated Hospital of Zhejiang University.The images have axial dimensions of 512×512 with slice numbers varying from 32 to 86, and the slice thickness is 5mm.The corresponding segmentation labels (both liver and tumor) were obtained by trained technicians with our home-developed semiautomatic liver segmentation tool, and then the results were approved and revised by experienced radiologists.Examples of the images used in the paper is shown in Fig. 1.
Figure 1.Examples of the images with four phases used in the paper.

Preprocessing
Preprocessing was carried out in a slice-wise fashion.Firstly, the images were normalized using the information in the corresponding header files by using the following equation: where min = wc− ww 2 ,max = wc+ ww 2 , wc denotes the window level and ww means the window width.Then the total slice number is unified to 40.For the volume images with slicer number smaller than 40, we put several 512×512 slices with intensity value of 0 at both the beginning and the end of the image to make sure the slice number is 40 and the volume image is in the middle of the new generate image.For the images with slicer number larger than 40 and smaller than 80, we deleted the slices in the beginning and the end of the image.For the slice number larger than 80, we use the inter layer sampling along the Z axis, then unify the slice number to 40.In the proposed algorithm, the input image is 512×512×40.

Proposed U-net architecture with dense connections
The architecture of the proposed network is shown in Fig. 2. For the liver segmentation, the input images (512×512×40) are firstly entering a convolution layer with a kernel size of (7×7×3) and stride of (2,2,2), then a convolution layer with kernel size of (5×5×3) and stride of (2,2,1) is applied.Then a max-pooling layer with size of (3×3×3) and stride of (2,2,2) is applied to down-sample the images into quarter sized (64×64×10).The quarter sized image is the input of the dense block.The proposed network contains 4 dense blocks, each dense block contains 4, 6, 16 and 8 dense layers respectively.The growth rate is set to 32.Each dense block is connected by the transition layer.The transition layer contains convolution layer with kernel sizes of (1×1×1) or (2×2×2) and stride of (2,2,2) or (2,2,1), and average pooling layers.The transition layer act as down sampling and descending the number of channels.The size of the output image of the fourth dense block is (8×8×5).In order to achieve the probability maps with the same resolution with the original images, up-sampling is then applied on the outputs of the 4th dense block, then the up sampled features are concatenate with the outputs with the same size from previous layer.All the activation function is chose to ReLU.We use sigmoid function to activate the last convolutional layer to gain the corresponding probability map.The final output of the network is (512×512×40×2), 2 channels are background and liver respectively.The details of the network architecture are shown in Table1.tion except the first two convolutional layers.The inputs are I × liver label , where the liver label is the segmentation results of the liver.By doing this, we can focus on the region inside liver and prevent miss-locating the tumor area to the outside of the live region.The input images (512×512×40) are firstly entering a convolution layer with a kernel size of (3×3×3) and stride of (2,2,2), then a convolution layer with kernel size of (3×3×3) and stride of (2,2,1) is applied.
Since the tumor region is small compared with the liver region, in order to achieve a smaller receptive field, the kernel size of first two convolutional layers are smaller than the ones for liver segmentation.The details of the network architecture for the segmentation of liver tumor are shown in Table 2.For the liver segmentation, the model uses the Adam optimizer and the binary cross-entropy as the loss function.
where y i denotes the ground truth of voxel i, y i denotes the output prediction probability of the network, M is the number of voxels in one image(512×512×40).
The initial learning rate l 0 was 0.0001 and decayed according to the equation l 0 × 0.5 (⌊ 1+n 5 ⌋) , where ⌊•⌋ denotes the rounded down operator, n is the training period.Almost all tumors are significantly smaller than livers, so for the tumor segmentation, Adam optimizer is adopted and the weighted categorical cross-entropy is used as loss function.
Here y i denotes the ground truth of voxel i, y i means the output of the network, w is the weight balancing tumor and background.w = Vtumor V total ,where V tumor is the volume of tumor, V total is the total volume of image.In order to accelerate the convergence of the network and gain more precisely training results, we the learning rate as: l r = l 0 × (1 − n N +1 0.9 , 0 = 0.0001,N is the number of epochs which is set to 40 .

Globally optimized refinement
The outputs of the networks are not precisely enough if we directly threshold the probability maps of the liver and the tumor.We need post processing to get more smooth, precise and connective segmentation results.In this paper, we simply threshold the probability maps of the liver and the tumor as the initial foreground.Than we adopt graph-cut based method to refine the segmentation results.In this section, a novel energy function that integrates region statistics and shape prior constraint is proposed.The energy functional is defined as: ) where l denotes the labels of all voxels, E D (l) and E B (l) represent the region term energy and boundary term energy respectively, and λ is the weight balancing them.The boundary term energy is defined as: where Ω is one entire volume image, ∇I(x) represents the square sum of gradients in three directions.The region term energy is defined as: where l x is the label of voxel x, R denotes the region term consisting of voxel intensity, voxel intensity cumulative histogram, and the output probability map of the proposed network.The region term is defined as: Here, I(x) denotes the intensity value of voxel x in the image I and I min and I max denote the value of 5% and 95% of the initial foreground intensity histogram respectively.|Hist local (x)| L1 represents the L1 norm between the cumulative histogram of the initial foreground region and cumulative histogram of the (5×5×5) neighborhood region centered at x, which is defined as: where Hist j initial and Hist j local indicate the j-th value of cumulative histogram of initial foreground and neighborhood region centered at x, K = 125.P (x) denotes the output probability of the proposed network.T represents the threshold value, where we chose 0.5 for liver segmentation and 0.75 for tumor segmentation.In our experiment set α = 0.2, β = I max − I min , γ = 4β.§3 Results The proposed model is trained using Keras and Tensorflow python libraries on a NVIDIA GTX 1080 Ti GPU.The total training time is about 26 hours each for liver and tumor segmentation.The post globally optimized refinement is implemented using C++, and the average runtime for one image globally optimized refinement is about 17 seconds.Figure 3        In this study, we explored 3D network based on U-net with dense connections for automatic liver segmentation in abdominal CT images.Specifically, a 3D model was trained for automatic liver location.The learned liver probability map was then integrated into the graph cut energy function for further segmentation refinement.Meanwhile, based on the result of liver segmentation, we applied the same framework to liver tumor segmentation.Liver tumor probability map was obtained to generate an initial segmentation.The learned probability map was then integrated into the graph cut energy function for further segmentation refinement.The main advantage of our method is that it does not require any user interaction for initialization.The high correlation between our segmentation and manual reference indicated that the proposed method has the clinical applicability for hepatic volume estimation.In the future, we will apply our method to more liver and tumor data, and other medical image segmentation tasks, such as kidney and spleen segmentation.

Figure 2 .
Figure 2. U-net architecture with dense connections for the segmentation of liver and tumor.
shows the loss and accuracy of training and test dataset of tumor segmentation network.As we can see the result of test data set is more volatile than that achieved by training data set, but still has convergent tendency.

Figure 3 .
Figure 3. (a)Training loss and (b)Training accuracy of the tumor segmentation network.

Figure 4 .
Figure 4. Example of liver segmentation result of a testing image.

Figure 5 .
Figure 5. Example of tumor segmentation result of a testing image.

Table 1 .
Details of the proposed architecture for the segmentation of liver.

Table 2 .
The details of the network architecture for the segmentation of liver tumor.

Table 3 .
Liver and tumor segmentation results on testing cases.