1 Introduction

Model-based segmentation (MBS) [1] has been successfully used for the automatic segmentation of anatomical structures in medical images (e.g., heart [1]) due to its ability to incorporate prior knowledge about the organ shape into the segmentation method. This allows for robust and accurate segmentation, even when the detection of organ boundaries is incomplete. MBS approaches typically use rather simple features for detecting organ boundaries such as strong gradients [8] and a set of additional constraints based on intensity value intervals [7] or scale invariant feature transforms [9]. Those features can detect organ boundaries reliably when they operate on well calibrated gray values, as is the case for computed tomography (CT) images. However, defining robust boundary features for the segmentation of organs with heterogeneous texture, such as the prostate, and varying MR protocols and scanners still remains a challenge due to the presence of weak and ambiguous boundaries caused by low signal-to-noise ratio and the inhomogeneity of the prostate, as well as the large variability in image contrast and appearance (see Fig. 1). To increase the robustness of the boundary detection for segmenting the prostate in MR images, Martin et al. [5] have used atlas matching to derive an initial organ probability map and then fine-tuned the segmentation using a deformable model, which was fit to the initial organ probability map and additional image features. Guo et al. [3] have extended this approach by using learned features from sparse stacked autoencoders for multi-atlas matching. Alternatively, Middleton et al. [6] have used a neural network to classify boundary voxels in MR images followed by the adaptation of a deformable model to the boundary voxels for lung segmentation. To speed up the detection of boundary points, Ghesu et al. [2] have used a sparse neural network for classification and restricted the boundary point search to voxels that are close to the mesh and aligned with the triangle normals.

Fig. 1.
figure 1

Example images showing the large variability in image and prostate appearance.

We propose a novel boundary detection approach for fully automatic model-based segmentation of medical images and apply it to the segmentation of the whole prostate in MR images. We formulate boundary detection as a regression task, where a convolutional neural network (CNN) is trained to predict the distances between the mesh and the organ boundary for each mesh triangle, thereby eliminating the need for the time-consuming evaluation of many boundary voxel candidates. Furthermore, we combine the per-triangle boundary detectors into a single network in order to facilitate the calculation of all boundary points in parallel and designed it to be locally adaptive to cope with variations of appearance for different parts of the organ. We have evaluated our method on the Prostate MR Image Segmentation 2012 (PROMISE12) challenge [4] data set with the results showing that the new boundary detection approach can detect boundaries more robustly with respect to contrast and appearance variations and more accurately than previously used features and that the combination of shape-regularized model-based segmentation and deep learning-based boundary detection achieves the highest accuracy on this very challenging task.

2 Method

In this section, we will give a brief introduction to the model-based segmentation framework followed by a description of two network architectures for boundary detection: a global neural network-based boundary detector that uses the same parameters for all triangles, and a triangle-specific boundary detector that uses locally adaptive neural networks to search for the right boundary depending on the triangle index. A comprehensive introduction the model-based segmentation and previously designed boundary detection functions can be found in the papers by Ecabert et al. [1] and Peters et al. [7].

Model-Based Segmentation. The prostate surface is modeled as a triangulated mesh with fixed number of vertices V and triangles T. Given an input image I, the mesh is first initialized based on a rough localization of the prostate using a 3D version of the generalized Hough transformation (GHT) [1], followed by a parametric and a deformable adaptation. Both adaptation steps are governed by the external energy that attracts the mesh surface to detected boundary points. The external energy, \(E_\text {ext}\), given a current mesh configuration and an image I is defined as

$$\begin{aligned} E_\text {ext} = \sum _{i=1}^{T}\left( \frac{ \nabla I(\varvec{x}_i^\text {boundary}) }{ \Vert \nabla I(\varvec{x}_i^\text {boundary})\Vert } (\varvec{c}_i - \varvec{x}_i^\text {boundary}) \right) ^2, \end{aligned}$$
(1)

where \(\varvec{c}_i\) denotes the center of triangle i, \(\varvec{x}_i^\text {boundary}\) denotes the boundary point for triangle i, and \(\nabla I(\varvec{x}_i^\text {boundary})\) is the image gradient at the boundary point \(\varvec{x}_i^\text {boundary}\). The boundary point difference \((\varvec{c}_i - \varvec{x}_i^\text {boundary})\) is projected onto the image gradient to allow cost-free lateral sliding of the triangles on the organ boundary. For the parametric adaptation, the external energy is minimized subject to the constraint that only affine transformations are applied to the mesh vertices. For the deformable adaptation, the vertices are allowed to float freely, but an internal energy term is added to the energy function, which penalizes deviations from a reference shape model of the prostate.

Neural Network-Based Boundary Detection. For each triangle, the corresponding boundary point is searched for on a line that is aligned with the triangle normal and passes through the triangle center. In previous work (e.g., [7]), candidate points on the search line were evaluated using predefined feature functions and the candidate point with the strongest feature response was selected as a boundary point. In contrast, we directly predict the signed distances \(d_i\), \(i \in [1, T]\), of the triangle centers to the organ boundary using neural networks, , that process small subvolumes of I with depth D, height H, and width W such that

$$\begin{aligned} \varvec{x}_i^\text {boundary} = \varvec{c}_i + d_i \frac{\varvec{n}_i}{\Vert \varvec{n}_i\Vert } \end{aligned}$$
(2)

with

$$\begin{aligned} d_i =f_i^\text {CNN}\big (S(I; \varvec{c}_i, \varvec{n}_i)\big ), \end{aligned}$$
(3)

where \(\varvec{n}_i\) are the normals of triangles i. The subvolumes \(S(I, \varvec{c}_i, \varvec{n}_i)\) are sampled on a \(D \times H \times W\) grid that is centered at \(\varvec{c}_i\) and aligned with \(\varvec{n}_i\) (see Fig. 2). The depth of the subvolumes is chosen such that they overlap with the organ boundary for the expected range of boundary distances called the capture range. The physical dimension of the subvolume is influenced by the number of voxels in each dimension of the subvolume and the spacing of the sampling grid. To keep the number of sampling points constant and thereby to allow the same network architecture to be used for different capture ranges, we change the voxel spacing in normal direction to account for different expected maximum distances of a triangle from the organ boundary. The parametric adaptation uses boundary detectors that were trained for an expected capture range of \(\pm {20\,\mathrm{mm}}\) and a sampling grid spacing of \({2\times 1 \times 1\,\mathrm{mm}}\). We padded the size of the subvolume to account for the reduction of volume size caused by the first few convolutional layers, resulting in a subvolume size of \({40 \times 5 \times 5}\) voxels or \({80 \times 5 \times 5\,\mathrm{mm}}\). After the parametric adaptation, the prostate mesh is already quite well adapted to the organ boundary so we trained a second set of boundary detectors for a capture range of \(\pm {5\,\mathrm{mm}}\) and a sampling grid spacing of \({0.5 \times 1 \times 1\,\mathrm{mm}}\) to facilitate the fine adaptation of the surface mesh during the deformable adaptation.

Fig. 2.
figure 2

Illustration of the boundary point search. For simplicity, the boundary point search is illustrated in 2D. The subvolume \(S(I, \varvec{c}_i, \varvec{n}_i)\) is extracted from the image I and used by a neural network as input to predict the signed distance \(d_i\) of triangle i to its boundary point \(\varvec{x}_i^\text {boundary}\).

We propose and evaluate two different architectures for the boundary detection networks: a global boundary detector network that uses the same parameters for all triangles, and a locally adaptive network that adds a triangle-specific channel weighting layer to the global network and thereby facilitates the search for different boundary features depending on the triangle index. For both architecture, we combine the per-triangle networks, \(f_i^\text {CNN}\), into a single network \(f^\text {CNN}\) that predicts all distances in one feedforward pass in order to speed up the prediction of all triangle distances and to allow for the sharing of parameters between the networks \(f_i^\text {CNN}\):

$$\begin{aligned} (d_1, d_2, \cdots , d_T) = f^\text {CNN}\big (S(I, \varvec{c}_1, \varvec{n}_1), S(I, \varvec{c}_2, \varvec{n}_2), \cdots , S(I, \varvec{c}_T, \varvec{n}_T)\big ). \end{aligned}$$
(4)

To simplify the network architecture, we assume that the width of all subvolumes is equal to their height and additionally reshape all subvolumes from size \(D \times W \times W\) to \(D \times W^2\). Consequently, the neural network for predicting the boundary distances is a function of the form . The network input is processed using several blocks of convolutional (Conv), batch normalization (BN), and rectified linear unit (ReLU) layers called CBR blocks as summarized in Table 1, where each \(1\times A \times B\) kernel only operates on the input values and hidden units corresponding to a single triangle. Through the repeated application of valid convolutions, the network input of size \(T \times D \times W^2\) is reduced to \(T \times 1 \times 1\), where each element of the output vector represents the boundary distance of a particular triangle. Because the kernels are shared between all triangles, the network essentially calculates the same function for each triangle. However, the appearance of the interior and exterior of the organ might vary over the organ boundary and hence a triangle-specific distance function is often required. To allow for the learning of triangle-specific distance estimators, we extend the global network to a locally adaptive network by introducing a new layer that is applied before the last convolutional layer and defined as

$$\begin{aligned} \varvec{x}_{L - 1} = \varvec{F} \odot \varvec{x}_{L - 2}, \end{aligned}$$
(5)

where L is the number of layers of the network, \(\varvec{x}_l\) is the output of layer l, \(\odot \) denotes element-wise multiplication, and is a trainable parameter matrix with one column per triangle and one row per channel of the output of the last CBR block. The locally adaptive network learns a pool of distance estimators, which are encoded in the convolutional kernels and shared between all triangles, along with triangle-specific weighting vectors encoded in the matrix \(\varvec{F}\) that allow the distance estimation to be adapted for different parts of the surface mesh.

Table 1. Network architecture with optional feature selection layer and corresponding dimensions used for predicting boundary point distances for each triangle for a subvolume size of \({40 \times 5 \times 5}\) voxels.

Training. Training of the boundary detectors requires a set of subvolumes that are extracted around each triangle and corresponding boundary distances, which can be generated from a set of training images and corresponding reference meshes. To that end, we adapt a method previously used for selecting optimal boundary detectors from a large set of candidates called Simulated Search [7]. At each training iteration, mesh triangles are transformed randomly and independently of each other using three types of basic transformations: (a) random translations along the triangle normal, (b) small translations orthogonal to the triangle normal, and (c) and small random rotations. Then, subvolumes are extracted for each transformed triangle and the distance of the triangle to the reference mesh is calculated. The network parameters are optimized using stochastic gradient descent by minimizing the root mean square error between the predicted and simulated distances. The coarse and fine boundary detectors have been trained with a translation range along the triangle normal of \(\pm {20\,\mathrm{mm}}\) and \(\pm {5\,\mathrm{mm}}\), which matches the capture range of the respective networks.

3 Results

We have evaluated our method on the training and test set from the Prostate MR Image Segmentation 2012 (PROMISE12) challengeFootnote 1 [4]. The training set consists of 50 T2-weighted MR images showing a large variability in organ size and shape. The training set contains acquisitions with and without endorectal coils and was acquired from multiple clinical centers using scanners from different vendors, thereby further adding to the variability in appearance and contrasts of the training images. Training of the boundary detection networks took about 6 h on an NVIDIA GeForce 1080 GTX graphics card. Segmentation of the prostate took about 37 s on the GPU and 98 s on the CPU using 8 cores. A comparison of the global and locally adaptive boundary detection networks with previously proposed boundary detection functions [7] was performed on the training set using 5-fold cross-validation. For a direct comparison to state-of-the-art methods, we submitted the segmentation results produced by the locally adaptive method on the test set for evaluation to the challenge.

For the comparison of different boundary detectors, we measured the segmentation accuracy in terms of the average boundary distance (ABD) between the produced and the reference segmentation. We were not able to achieve good segmentation results (\(\text {ABD} = {6.09\,\mathrm{mm}}\)) using designed boundary detection functions with trained parameters as described in [7], which shows the difficulty of detecting the right boundaries for this data set. Using the global boundary detection network, we were able to achieve satisfying segmentation results with a mean ABD of \({2.08\,\mathrm{mm}}\). The ABD could be further reduced to \({1.48\,\mathrm{mm}}\) using the locally adaptive network, which produced similar results compared to the global network, except for a few cases where the global network was not able to detect the correct boundary (see Fig. 3(a)) due to the inhomogeneous appearance of the prostate. In those cases, the global network only detected the boundary of the central gland, which produces the correct result for the anterior part of the prostate, but causes errors where the prostate boundary is defined by the peripheral zone. In contrast, locally adaptive networks (see Fig. 3(b)) are able to switch between the detection of the central gland and the peripheral zone depending on the triangle index, consequently detecting the true boundary in all cases.

Fig. 3.
figure 3

Comparison of segmentation results (red) and reference meshes (green) using two network architectures. The locally adaptive network correctly detects the prostate hull for the central gland and the peripheral zone, despite the large appearance differences of the two structures.

Table 2. Comparison of our method to state-of-the-art methods on the PROMISE12 challenge in terms of the Dice similarity coefficient (DSC), the average boundary distance (ABD), absolute volume difference (VD), and the 95 percentile Hausdorff distance (HD95) calculated over the whole prostate. Our method ranks first in all metrics except for HD95 and performs better on average than a second observer (\(\text {score} > 85\)).

A comparison of our method to the best performing methods on the PROMISE12 challenge in terms of the Dice similarity coefficient (DSC), the average boundary distance (ABD), the absolute volume difference (VD), and the 95 percentile Hausdorff distance (HD95) calculated over the whole prostate is summarized in Table 2. The “score” relates a metric to a second observer, where a score of 85 is assigned if a method performs as well as a second observer and a score of 100 corresponds to a perfect agreement with the reference segmentation. At the writing of this paper, our method placed first in the challenge out of 40 entries, although the scores of the top three methods are very close. With a DSC of 90.5%, an average boundary distance of \({1.71\,\mathrm{mm}}\), and a mean absolute volume difference of 6.6% calculated over the whole prostate, our method achieved the best scores in these three metrics. Our method is second in only one metric, the 95 percentile Hausdorff distance, where our method achieved the second best value (4.94 mm) and is only slightly worse than the fully convolutional neural network approach by UBCRCL, which achieved a distance of 4.90 mm. Overall, our method demonstrated very good segmentation results and performed better on average (\(\text {score} > 85\)) than a second human observer.

4 Conclusion

We presented a novel deep learning-based method for detecting boundary points for the model-based segmentation of the prostate in MR images. We showed that using neural networks to directly predict the distances to the organ boundary instead of evaluating several boundary candidates using hand-crafted boundary features significantly improves the accuracy and robustness to large contrast variations. The accuracy could be further improved by making the network locally adaptive, which facilitates the learning of boundary detectors that are tuned for specific parts of the boundary. With an average boundary distance of 1.71mm and a Dice similarity coefficient of \({90.5}{\%}\), our method was able to segment the prostate more accurately on average than a second human observer and placed first out of 40 submitted entries on this very challenging data set.