The LGDF model, originally proposed in [47], builds on existing active contour literature by introducing a new energy functional based on the local Gaussian distributions of image intensity. This functional drives a variational level set approach which is able to segment objects whose intensity mean and variance are inhomogeneous. Rather than creating segments whose intensity is as uniform as possible, this algorithm allows slow changes in intensity across an object, penalizing only sudden changes within it, without relying on a gradient based edge detector [5].
The segmentation is represented by a level set function \(\phi (\mathbf {x})\). The foreground region is the set of points \(\{\mathbf {x} : \phi (\mathbf {x}) < 0\}\), and the exterior (or background) is \(\{\mathbf {x} :\phi (\mathbf {x}) \ge 0\}\). The contour itself (or surface in 3D) is thus defined implicitly as the zero level set, \(\{\mathbf {x} : \phi (\mathbf {x}) = 0\}\). Segmentation is achieved by minimizing a global energy functional:
$$\begin{aligned} E = E^{\text {LGDF}}(I,\phi ) + \mu \mathcal {P}(\phi ) + \nu \mathcal {L}(\phi ) \end{aligned}$$
(1)
where \(\mu ,\nu >0\) are weighting constants, \(E^{\text {LGDF}}\) is the LGDF energy term which drives the contour to fit along salient image edges, \(\mathcal {P}\) avoids the need to periodically re-initialize \(\phi \) to a signed distance function [24], and \(\mathcal {L}\) penalizes the contour length to ensure smoothness. The \(E^{\text {LGDF}}\) term is the sum of the individual LGDF energies for each pixel \(\mathbf {x}\):
$$\begin{aligned} \begin{aligned} E^{\text {LGDF}}(I, \phi , \mathbf {x}) =&-\int _{\varOmega } \omega (\mathbf {y}-\mathbf {x}) \log (p_{1,\mathbf {x}}(I(\mathbf {y})))M_1(\mathbf {y}) \,\mathrm {d}\mathbf {y} \\&-\int _{\varOmega } \omega (\mathbf {y}-\mathbf {x}) \log (p_{2,\mathbf {x}}(I(\mathbf {y})))M_2(\mathbf {y}) \,\mathrm {d}\mathbf {y} \end{aligned} \end{aligned}$$
(2)
where \(\omega (\mathbf {y}-\mathbf {x})\) is a Gaussian weighting function centered on \(\mathbf {x}\), \(p_{1,\mathbf {x}}\) is a Gaussian approximation of the intensity distribution for the part of the neighborhood of \(\mathbf {x}\) lying outside the contour (and inside for \(p_{2,\mathbf {x}}\)), and \(M_1\) equals one outside the contour, zero inside (vice-versa for \(M_2\)). This quantity is smaller when the intensity distributions in the parts of the neighborhood of \(\mathbf {x}\) lying outside and inside the contour are well approximated as Gaussian distributions, which can only be achieved by deforming the contour so that it separates regions of different intensity mean and variance.
The mean and variance parameters for these local Gaussian distributions are denoted \(u_i(\mathbf {x})\), \(\sigma _i(\mathbf {x})\) where \(i \in \{ 1, 2 \}\) for regions outside and inside the contour, respectively:
$$\begin{aligned} u_i(\mathbf {x})=\, & {} \frac{\int \omega (\mathbf {y}-\mathbf {x})I(\mathbf {y})M_i(\phi (\mathbf {y})) \,\mathrm {d}\mathbf {y}}{\int \omega (\mathbf {y}-\mathbf {x}) M_i(\phi (\mathbf {y})) \,\mathrm {d}\mathbf {y}} \end{aligned}$$
(3)
$$\begin{aligned} \sigma _i(\mathbf {x})^2=\, & {} \frac{\int \omega (\mathbf {y}-\mathbf {x})(u_i(\mathbf {x}) - I(\mathbf {y}))^2 M_i(\phi (\mathbf {y})) \,\mathrm {d}\mathbf {y}}{\int \omega (\mathbf {y}-\mathbf {x}) M_i(\phi (\mathbf {y})) \,\mathrm {d}\mathbf {y}} \end{aligned}$$
(4)
Specifically, they express for each pixel the mean and variance of neighboring gray values that lie outside and inside the contour (for pixels whose entire neighborhood lies on one side of the contour, only one pair of these values is defined). The size of each pixel’s neighborhood is determined by the standard deviation of the Gaussian weighting function, \(\omega \). This is a user-defined parameter, denoted \(\sigma \). A larger neighborhood increases the range from which a pixel may influence the contour. This results in faster evolution, greater capture range, and a greater tendency to produce segments whose boundaries separate large regions of different mean intensity.
The internal energy term \({\mathcal {P}}\) penalizes the contour’s deviation from a signed distance function [24] to ensure numerical stability [32]:
$$\begin{aligned} \mathcal {P}(\phi ) = \int _{\varOmega } \frac{1}{2} \left( \left| \nabla \phi (\mathbf {x}) \right| - 1 \right) ^2 \,\mathrm {d}\mathbf {x} \end{aligned}$$
(5)
and \(\mathcal {L}\) penalizes the contour length to ensure smoothness:
$$\begin{aligned} \mathcal {L}(\phi ) = \int _{\varOmega } \left| \nabla H( \phi (\mathbf {x}) ) \right| \,\mathrm {d}\mathbf {x} \end{aligned}$$
(6)
where H is the \(C^{\infty }\) regularized Heaviside function, discretized to operate on a regular grid, first proposed by [5]:
$$\begin{aligned} H(x)=\frac{1}{2} \left[ 1+ \frac{2}{\pi }\text {arctan}\left( x \right) \right] \end{aligned}$$
(7)
The total energy functional (Eq. 1) can be minimized by applying the calculus of variations [47] yielding the following PDE:
$$\begin{aligned} \frac{\partial \phi }{\partial t} = -\,\delta (\phi )(\lambda _1 e_1 - \lambda _2 e_2) + \mu \left( \nabla ^2 \phi - \kappa \right) + \nu \delta (\phi ) \kappa \end{aligned}$$
(8)
where \(\delta \) is the regularized Dirac function \(\delta (x) = H'(x)\) [5], \(\lambda _1\), \(\lambda _2\), \(\nu \) and \(\mu \) are parameters controlling the weight of the terms, and \(\kappa \) is the contour’s local curvature [31]:
$$\begin{aligned} \kappa = \text {div} \left( \frac{\nabla \phi }{\left| \nabla \phi \right| } \right) \end{aligned}$$
(9)
and \(-\,\delta (\phi )(\lambda _1 e_1 - \lambda _2 e_2)\) is the force due to \(E^{\text {LGDF}}\):
$$\begin{aligned} e_i(\mathbf {x})=\int _{\varOmega }\omega (\mathbf {y}-\mathbf {x})\left[ \text {log} (\sigma _i(\mathbf {y})) + \frac{(u_i(\mathbf {y})-I(\mathbf {x}))^2}{2\sigma _i(\mathbf {y})^2} \right] \,\mathrm {d}\mathbf {y} \end{aligned}$$
(10)
The data fitting term \(e_1(\mathbf {x})\) quantifies how badly the pixel \(\mathbf {x}\) would fit with the outside-contour parts of its neighbors’ neighborhoods. When \(e_1\) is high and \(\mathbf {x}\) does not belong outside, \(\frac{\partial \phi }{\partial t}\) is made more negative, so \(\phi \) lowers at that point and the contour grows outwards, swallowing \(\mathbf {x}\). The same applies in reverse for \(e_2\).
Due to the smooth form of the \(C^{\infty }\) regularized Heaviside (Eq. 7), \(\delta (\phi ) = H'(\phi )\) is nonzero everywhere. This allows \(\phi \) some freedom to change at any point in the image, not just in a narrow band around the contour. This helps prevent convergence on local energy minima [5].
GPU implementation
The goal of the implementation is to iteratively solve Eq. 8 for \(\phi (\mathbf {x}, t)\) and visualize the results at each iteration. This is done by discretizing \(\phi \) with respect to time and applying numerical integration: starting with \(\phi (\mathbf {x}, t=0)\) (which is specified by the user), an update loop computes \(\phi (\mathbf {x}, t+\varDelta t)\) by computing \(\frac{\partial \phi }{\partial t}\) according to Eq. 8 and assuming this quantity stays constant during the short time step \(\varDelta t\). Existing GPU level set methods implement their update rule inside a single kernel function; however, \(E^{\text {LGDF}}\) is more challenging as relies on intermediate stages with neighborhood operations, such as convolutions and derivatives, whose sequential dependencies must be considered such as to avoid race conditions.
The update rule in Eq. 8 requires convolutions (Eq. 10) of intermediate variables that themselves rely on other convolutions (Eqs. 3–4). The relationships of these variables are shown in Fig. 2, where an arrow from A to B indicates that A is required in the computation of B. Wherever they appear, I denotes the input image and H the smooth Heaviside function (Eq. 7). All variables of the form GX represent the n-dimensional Gaussian convolution of X.
We compute the means and variances (Eqs. 3–4) from GIH, GH, \(GI^2H\), GI and \(GI^2\) using the following formulas:
$$\begin{aligned} u_1&= \frac{GIH}{GH} \qquad \sigma ^2_1 = \frac{GI^2H}{GH} - u_1^2 \end{aligned}$$
(11)
$$\begin{aligned} u_2&= \frac{GI - GIH}{1 - GH} \qquad \sigma ^2_2 = \frac{GI^2 - GI^2H}{1 - GH} - u^2_2 \end{aligned}$$
(12)
For \(\sigma ^2_i\) we have used the alternative variance formula \(\text {Var}[X] = E[X^2] - E[X]^2\), and for \(u_2\) and \(\sigma _2\) we have used \(G_{\sigma } *(1 - H) = 1 - G_{\sigma } *H\) in the denominators, where \(G_{\sigma } *\) denotes convolution with a Gaussian kernel of standard deviation \(\sigma \). This is not to be confused with \(\sigma _1\) and \(\sigma _2\), the local intensity standard deviations outside and inside the contour. By exploiting these tricks, we are able to compute Eqs. 11–12 using only three convolutions per update cycle (since GI and \(GI^2\) are constant). To compute the image force term \(e_1 - e_2\), we expand the brackets in Eq. 10 to get:
$$\begin{aligned} e_i(\mathbf {x})&= \int _{\varOmega } \omega (\mathbf {y}-\mathbf {x}) \left[ \log (\sigma _i(\mathbf {y})) + \frac{u_i(\mathbf {y})^2}{2\sigma _i(\mathbf {y})^2} \right] \,\mathrm {d}\mathbf {y} \nonumber \\&\quad -\,I(\mathbf {x}) \int _{\varOmega } \omega (\mathbf {y}-\mathbf {x}) \frac{u_i(\mathbf {y})}{\sigma _i(\mathbf {y})^2} \,\mathrm {d}\mathbf {y} \nonumber \\&\quad +\, I(\mathbf {x})^2 \int _{\varOmega } \omega (\mathbf {y}-\mathbf {x}) \frac{1}{2\sigma _i(\mathbf {y})^2} \,\mathrm {d}\mathbf {y} \end{aligned}$$
(13)
$$\begin{aligned}&= G_{\sigma } *\left[ \log (\sigma _i(\mathbf {y})) + \frac{u_i(\mathbf {y})^2}{2\sigma _i(\mathbf {y})^2} \right] \nonumber \\&\quad -\, I(\mathbf {x}) \left[ G_{\sigma } *\frac{u_i(\mathbf {y})}{\sigma _i(\mathbf {y})^2} \right] + I(\mathbf {x})^2 \left[ G_{\sigma } *\frac{1}{2\sigma _i(\mathbf {y})^2} \right] \end{aligned}$$
(14)
To compute the three terms in Eq. 14, we first pre-compute the operands of the Gaussian convolutions (\(E_0\), \(E_1\) and \(E_2\) in Fig. 2), then convolve them (\(GE_0\), \(GE_1\) and \(GE_2\) in Fig. 2), then weight them by 1, I and \(I^2\) and sum them. This results in just six convolutions altogether. Note that \(e_1\) and \(e_2\) are not computed separately; the variables \(E_0\), \(E_1\) and \(E_2\) are the three corresponding parts of \(e_1 - e_2\).
GPU architecture
The six required Gaussian convolutions require a large number of buffer reads. However, an n-dimensional Gaussian filter can be separated into the matrix product of n vectors allowing us to convolve with n 1D filters instead of one very large n-dimensional filter. This reduces \(l^2\) texture samples to 2l in 2D or \(l^3\) texture samples to 3l in 3D, for a truncated Gaussian kernel of length l. Therefore, our overall algorithmic complexity is \(O(n \cdot l)\) for an input of size n.
The buffer reads for the horizontal Gaussian pass are coalesced, but for the vertical and depth passes the reads are not coalesced and therefore very slow. This could be alleviated by transposing the image between convolutions, making the buffer reads coalesced for vertical and depth passes. However, transposing the image three times per convolution is slow, even when this is optimized by using local/shared memory. In our architecture, we instead make use of texture memory, which preserves spatial locality among neighboring pixels in all three dimensions, making access time for all three passes comparable to coalesced buffer reads. This allows us to skip the transpositions altogether and convolve up to four images at once in the available texture memory channels, yielding faster overall performance than local/shared memory approaches.
Texture memory buffers must either be read-only or write-only within a given kernel function; therefore, results computed from data in a texture buffer must be written to a different buffer. The memory layout for our architecture includes kernels for the separable X, Y, and Z Gaussian passes accordingly, which we show in Fig. 3. This figure lists our kernels in the order they are called and shows their inputs and outputs (corresponding to the nodes in Fig. 2) within the available 4\(\times \)32-bit channels per GPU texture buffer. Besides the convolutions, the rest of our implementation is straightforward; we store the 1D convolution filter weights in constant memory and all intermediate values reside in registers.
The three Gaussian convolutions of the image and Heaviside (GIH, GH, \(GI^2H\), Fig. 2) are the result of neighborhood operations, but are not dependent on each other. This is also the case with the three Gaussian convolutions \(GE_0\), \(GE_1\), \(GE_2\). We therefore create kernels shown in Fig. 3 to perform each set of three Gaussian convolutions simultaneously, and two more kernels to prepare for them (called ‘Prep Conv 1’ to compute H, IH, \(I^2H\), and ‘Prep Conv 2’ to compute \(E_0\), \(E_1\), \(E_2\)). The curvature field \(\kappa \) (Eq. 9) requires all three (two in 2D) gradient components to be first stored in texture memory in order to avoid race conditions, since all differential operations are computed by central finite differences, a neighborhood operation. This is why we compute \(\kappa \) early on and pass it through the Gaussian convolution kernels in the conveniently available w channel of the texture buffer; computing \(\kappa \) immediately before ‘Update \(\phi \)’ would require an extra texture buffer since there is only one unused channel at that point. After updating, we force the partial derivatives of \(\phi \) to be zero at their corresponding image boundaries (in the ‘Neumann/Copy’ kernel) to prevent numerical instability and copy the result back into buffer A for the next iteration.
Interactive brushes
There are many applications in the biosciences, computer vision, medical, and pattern recognition communities where guidance by human experts is required [7, 20, 27, 48, 50]. The current interactive GPU level set methods, such as [36], provide interfaces to (1) initialize \(\phi \) inside/outside the object, (2) dynamically adjust parameters, and in some cases (3) allow \(\phi \) to be edited (a union operator on new objects/regions, followed by rerunning of the algorithm); however, it is difficult to refine evolution such as to prevent contour leaking or constrain the evolution. The graph-cuts and radial-basis function approaches [15, 27] allow users to sketch lines or define control points which are tagged to both the desired object and the undesired regions, but we find the process difficult to refine where the segmented boundary lies somewhere between the input locations, where there may not be discernible image intensity features (see Fig. 4 top-left and in the accompanying video).
To address these issues, we follow the strategies outlined in the survey [29] with similar functions to the modeling/graphics literature [12]; however, we closely integrate brush functions with our segmentation kernels with the goal of editing and constraining \(\phi \) during the iterative evolution process itself. Specifically, we provide functions to initialize, append, erase, and constrain (locally stop evolution of \(\phi \)) after each iteration of the update step (Eq. 8), and visualize the results after each iteration. Note that for simplicity we define our functions with circular (2D) or spherical (3D) regions, but there is nothing to prevent implementing more bespoke functions, such as surface pulling [12].
All brush functions are centered at the mouse position \(\mathbf {p}\) with radius r and are implemented in the ‘Compose’ kernel (Fig. 3). We have deliberately arranged the read buffer B to link to \(\phi \) from the previous update iteration. To complete a brush action, we relaunch the ‘Compose’ kernel with the brush parameters followed by the ‘Neumann/Copy’ kernel between each update iteration. The initialization brush sets \(\phi \) to a binary step function with a small positive constant (we choose 2 empirically):
$$\begin{aligned} \phi (\mathbf {x}) := 2 \cdot \text {sgn}(\Vert \mathbf {x}-\mathbf {p} \Vert -r) \end{aligned}$$
(15)
where \(:=\) denotes assignment. The user can continue to ‘paint’ new foreground regions using the additive brush:
$$\phi ({\mathbf {x}}) := \left\{ {\begin{array}{*{20}l} \phi ({\mathbf {x}})\hfill & \quad {\text{if}} \; \Vert {\mathbf {x}}-{\mathbf {p}} \Vert -r > 0 \\ \text {min}(\Vert {\mathbf {x}}-{\mathbf {p}} \Vert -r, \phi ({\mathbf {x}})) \hfill & \quad {\text {otherwise}} \end{array}}\right. $$
(16)
To erase a foreground region, we simply reassign any values inside the brush region with a small positive constant:
$$\begin{aligned} \phi ({\mathbf {x}}) := \left\{ \begin{array}{ll} \phi ({\mathbf {x}})\hfill &\quad {\text{if}} \; \Vert {\mathbf {x}}-{\mathbf {p}} \Vert -r > 0\\ 2\hfill&\quad {\text {otherwise}} \end{array}\right. \end{aligned}$$
(17)
However, while the erase brush is useful for undoing undesired strokes, it will not stop the contour from leaking into undesired regions, as \(\phi \) will continually update and burst through the previously erased region again. Therefore, we introduce a ‘barrier’ brush to persistently block the level set from growing into a fixed region. Rather than define this region in another buffer, we set \(\phi \) to \(\infty \) and check for \(\infty \) values when computing \(\varDelta \phi \) in the ‘Update \(\phi \)’ kernel:
$$\begin{aligned} \phi ({\mathbf {x}}):= \left\{ \begin{array}{ll} \phi ({\mathbf {x}}) \hfill&\quad{\text{if}} \; \Vert {\mathbf {x}}-{\mathbf {p}} \Vert -r>0 \\ \infty \hfill&\quad {\text {otherwise} }\end{array}\right. \text {(compose kernel)} \end{aligned}$$
(18)
$$\begin{aligned} \varDelta \phi ({\mathbf {x}}):= \left\{ \begin{array}{ll} 0 & \quad {\text{if}} \; \phi ({\mathbf {x}})= \infty \\ \varDelta \phi ({\mathbf {x}}) &\quad {\text {otherwise}} \end{array}\right. \quad \text {(update }\phi {\text { kernel)}} \end{aligned}$$
(19)
In our implementation, we found it useful to allow users to pause and unpause evolution with \(\varDelta t=0\) and \(\varDelta t=0.1\), while still allowing users to commit brush strokes. This makes it easier to guide the contour without having to compete against its growth. Furthermore, by using the previous value of \(\phi \) stored in the B buffer z-channel in combination with the rendered value of \(\phi \) stored in the A buffer z-channel, we can display the currently brush size and position without committing the stroke.
In Fig. 4, we illustrate two simple use-cases of our interactive brushes. In the top row, the user paints using the ‘barrier’ brush to cover the full image region, shown in blue. This is followed by the ‘erase’ brush (Eq. 17), to cut a permissible region in which a new seed region is placed (Eq. 16), which evolves to segment the macular hole without leaking into the opening. (We show this in 3D in the accompanying video.) Similarly, in the lower row, the vessels are segmented without leaking into the heart (see also Table 5 2b–c).
Real-time rendering
To render the zero-crossing of the level set function \(\phi \) in 3D, we launch a render kernel after the Neumann/Copy step in the update loop (Fig. 3). We send a camera matrix to initialize each pixel with a ray origin \(\mathbf {o}\) and direction unit vector \(\hat{d}\). We parameterize the ray’s position by \(\mathbf {r} = \mathbf {o} + \hat{d} s\) and, assuming \(\phi \) to be the signed distance to the zero-crossing, advance the ray in steps by \(s_{i+1} = s_i + \phi (\mathbf {r})\). However, \(\phi \) is not a perfect signed distance function; therefore, we must divide our step size by the maximum derivative of \(\phi \); this value is not known precisely, but in practice we find we can obtain sufficiently small visual artifacts at good performance by choosing a constant step size \(\varDelta s = 0.3 \phi (\mathbf {r})\). Further, given that \(\phi \) is not defined outside of the image boundaries, we initially advance \(s_0\) to the start of the image axis-aligned bounding box (where the \(s_0\) is calculated using an analytical ray-box intersection function [21]). To increase visual quality, we implement 3D ambient occlusion and soft-shadows by marching the ray in the directional of the normal and light source once it has hit a surface [11].
The output of our real-time rendering implementation, using hardware trilinear interpolation to sample \(\phi \) and with \(\varDelta s = 0.3 \phi (\mathbf {r})\), is shown in Fig. 5. (The render kernel has negligible impact on performance.)