1 Introduction

Branching tree-like anatomical structures are abundant in the human body (i.e., vascular and airway trees of circulatory and respiratory systems) and analyzing their properties is important for various clinical applications, e.g., diagnosis and surgical planning. A necessary precursor to morphological tree analysis is segmenting the trees from 3D medical images. However, 3D segmentation of tree structures is challenging due to, e.g., insufficient contrast between vessels or airways and background, neighbouring/touching tissue, and geometrical variability. Extracting the trees amounts primarily to identifying the bifurcations and the curvilinear paths between them.

Several previous works on segmentation of tree-like structures relied on local, voxel/patch-level information. For example, Frangi et al. proposed a filter based on local Hessian matrix [7]; Law et al. estimated branch direction based on optimal local inward flux orientation [10]; Schneider et al. used steerable filters and random forest for pixel-wise classification [22]; and Wu et al. proposed a deep learning framework to classify local patches for tracking [26].

Tracking based methods, on the other hand, provide better structural information, but they generally fail to build a global tree structure. Lesage et al. proposed a particle filtering method to track coronary vessels, which incorporate vessel geometry using flux based features [11]. Macedo et al. [13] proposed a centerline-tracking method, on top of a 2D feature based bifurcation detector.

Incorporating prior knowledge, like geometry and topology, into optimization based image segmentation algorithms has been proven useful for obtaining more accurate and plausible results [17]. However, these priors typically introduce non-convexities in the objective functions.

Although tree-like structures were extracted in [19, 20, 23,24,25], in contrast to our work, their trees need to be seeded in tubularity measurement maps and the initial tree topologies are seeding-dependent. In [24, 25], edge pair-wise geometrical, instead of topological, prior dominated the optimization process, which makes it impossible to maintain a desired, fixed anatomical structure. While in [19, 20, 23], the topological priors were interpreted as 2-tuple or 3-tuple of neighboring edges, instead of constructing the whole anatomical tree structure. To segment coronary vessels on 2D xray sequences, M’hiri et al. used temporal and spatial prior inherited from an earlier xray image, but the method is difficult to extend to 3D [15]. Beriault et al. proposed a CRF framework that used brain structures (e.g., basal ganglia) and sinuses locations as anatomical priors for segmenting the cerebral vasculature [2]. But none of these methods addressed the global branching aspect of anatomical tree structures.

Our goal in this work is to perform 3D tree extraction while satisfying these two important objectives: (i) encode the geometrical and topological priors of trees and (ii) ensure a globally optimal tree extraction solution. In this paper, we achieve both objectives by adopting, for the first time, pictorial structures for tree extraction.

Pictorial structure were introduced into the computer vision community in 2005 (Felzenszwalb et al. [5]) and an extensive literature (e.g., Belagiannis et al. [1], Burenius et al. [3]) has been established based on this concept since then. To ensure global optimality, pictorial structures require a model with a tree-like topology. This property makes them a natural and ideal fit to the problem of anatomical tree extraction.

2 Methodology

At a high level, our automatic approach comprises two key steps (Fig. 1): (i) detecting bifurcations and (ii) extracting centerlines of branches connecting bifurcations. Step (i) is achieved via fitting the pictorial structure to the 3D image data by globally optimizing an energy function with an artificial neural network (ANN) derived unary term and a geometrical statistics based binary term. Step (ii) is achieved via a globally optimal minimal path extraction.

Fig. 1.
figure 1

An overview of the proposed method.

2.1 Bifurcation Detection

We formulate the problem of bifurcation detection in 3D as a pictorial structure optimization. A pictorial structure models a deformable object by a set of connected parts. This technique finds the instances of an object in an image by measuring the matching cost for each part and a deformation cost between each two connected components. Felzenszwalb et al. [5] restricted the connection of components to form an acyclic graph \(G=(V,E)\), where each vertex \(v_i\) corresponds to a component and each edge \(e_{ij}=(v_i,v_j)\) models a connection between vertices \(v_i\) and \(v_j\). We encode the 3D anatomical tree bifurcations as the nodes of the pictorial structure whereas branch directions and lengths statistics are learned as geometrical priors for regularizing pictorial edges. Let \(\mathcal {I} (\mathbf x )\) be an N dimensional image and \(\mathbf x \in \mathbb {R} ^N\), we optimize an energy function over the location of n nodes in N dimensional space, as follows:

$$\begin{aligned} \mathcal {L}^*=\mathop {\mathrm {arg\,min}}\limits _{\mathcal {L}=\{\mathcal {L}_1,...,\mathcal {L}_n\}} (\underbrace{\sum _{i=1}^{n}\mathcal {U}(\mathcal {L}_i|\mathcal {I})}_{\textit{unary term}} + \underbrace{\sum _{e_{ij}\in E}\mathcal {B} (\mathcal {L}_i, \mathcal {L}_j)}_{\textit{binary term}}) \end{aligned}$$
(1)

where \(\mathcal {U}(\mathcal {I}, \mathcal {L}_i)\) is the unary term penalizing locating \(v_i\) at location \(\mathcal {L}_i\) and \(\mathcal {B} (\mathcal {L}_i, \mathcal {L}_j)\) is the binary term penalizing the deformation of the vector \(\mathcal {L}_{ij}=\mathcal {L}_j-\mathcal {L}_i\) away from geometrical priors learned from training data. By leveraging the generalized distance transform [6], the pictorial energy function in (1) is efficiently and globally minimized.

Unary Term via an Artificial Neural Network: We train a three layer neural network stacked with Restricted Boltzmann Machines (RBM) to build a distance map and use it as the unary term in (1). RBM is a two-layer network of visible and hidden units with no intra-layer connections and symmetrically weighted inter-layer connections. Instead of initializing the network by small random weights, we pre-train the network unsupervisely using RBMs [8]. RBMs compute the joint probability of visible and hidden units and provide a high level representation of data in an unsupervised manner. We construct the network by stacking three RBMs, considering hidden units of preceding ones as visible units of following RBMs. We then fine-tune an ANN, end-to-end, to predict tree voxels by minimizing the total cross entropy between predicted and ground truth segmentations of a training dataset. To encourage a detected bifurcation to be close to the center of its neighboring branches, we instead compute and predict a distance map from segmented edges rather than the segmentation maps themselves.

Binary Term from Geometrical Statistical Priors: We learn distribution of branch angles and lengths of anatomical trees from the skeletons of the ground truth segmentations of a training dataset. Anatomical branch angles and lengths are encoded as three dimensional displacement vectors pointing from bifurcations at lower generations of the tree to upper generations. We model the joint prior distribution of locations of two pictorial connected components as a multivariate Gaussian. The mean vector \(\mu _{ij}\) and covariance matrix \(\varSigma _{ij}\) of a displacement vector between nodes \(v_i\) and \(v_j\) are estimated from the training data. By applying singular value decomposition, i.e., \(\varSigma _{ij}=U'_{ij}M_{ij}^{-1}U_{ij}\), we write the following joint likelihood estimation in the form of the Mahalanobis distance:

$$\begin{aligned} -\log {p(\mathcal {L}_i,\mathcal {L}_j)}~\propto ~ d_{ij}(\mathcal {L}_i,\mathcal {L}_j)=[T_{ij}(\mathcal {L}_i)-T_{ji}(\mathcal {L}_j)]' M_{ij}^{-1}[T_{ij}(\mathcal {L}_i)-T_{ji}(\mathcal {L}_j)]\nonumber \\ \end{aligned}$$
(2)

where \(\mathcal {L}_i\) and \(\mathcal {L}_j\) are the locations of nodes \(v_i\) and \(v_j\), respectively; \(T_{ij}=U'_{ij}(\mathcal {L}_i-\mu _{ij})\) and \(T_{ji}=U'_{ij}(\mathcal {L}_j)\) are rigid (i.e. 6 DOF) spatial transformations [5]; and \(M_{ij}\) is a diagonal matrix weighting the deformation cost of connection \(e_{ij}\).

To efficiently find a global solution to (1), we must pick a set of connections between pictorial components that form an acyclic graph (tree). One natural option is to adopt the anatomical tree connectivity as the pictorial structure tree connectivity (option 1). However, it is more informative to connect pairs of nodes with a consistent behaviour across the training data [6]. So, alternatively (option 2), we construct a complete weighted graph over all vertices, assign to edge \(e_{ij}\) a weight \(w_{ij}\) equal to 2-norm of covariance matrix of three dimensional displacement vectors \(\Vert \varSigma _{ij}\Vert _2\), and finally find the Minimum spanning tree (MSP) of this graph by Prim’s algorithm [18]. We found that the detected tree structure is the same as anatomical tree.

2.2 Branch Centerline Extraction

We use a globally optimal minimal path extraction, based on the fast marching method, to extract the centerlines of all tree branches. While most minimal path extraction methods are semi-automatic and require users to provide a path’s start and end points [9, 12, 14, 16], we used the bifurcation locations detected by the pictorial model optimization to initialize the minimal path extraction.

In tubular structures, if the speed function of the minimal path algorithm is homogeneous or has a small variation near the actual centerline, the shortest path is detected as the Euclidean path instead of the medial path or centerline. To ensure that the detected paths pass along the centers of branches, we adapted Deschamps’ path centering algorithm [4]. Deschamps first extracted a rough centerline for tubular structures and then used the detected centerline to achieve a rough binary segmentation of vasculatures. A distance transform of the detected edges in the segmented vasculatures is computed and is fed to the minimal path algorithm as a new speed function. In this paper, instead of segmenting using estimated centerline, we use the output mask from our ANN as an approximate segmentation.

3 Experiments

3.1 Data Description

Synthetic Data: We generated 50 volumes, each of size \(150^3\) and containing a binary tree structure with 4 levels. The tree statistics are set as: mean branch lengths of \(\{50,40,30,20\}\) (voxels) and standard deviation (std) of 2; the mean angles between the neighboring levels are \(\{\frac{\pi }{4},\frac{\pi }{6},\frac{\pi }{12}\}\) with std \(=\frac{\pi }{36}\).

Clinical CT Data: We used 19 chest computed tomography (CT) scansFootnote 1. We performed 3-fold cross validation to train the model. The ANN is trained with 240,000 2D patches of size \(29\times 29\) along axial planes around each voxel. We chose our ANNs based on empirical evidence and previous works. Other ANN designs, or even other non-ANN methods for calculating the unary term, may yield even better results. A grid search is performed to set and fix the ANN hyperparameters. Our pictorial model consists of the seven components of the first four levels of an airway tree (i.e., trachea, left and right main bronchi, etc.)

3.2 Evaluation Measures

Bifurcation Detection evaluation: The performance of the proposed approach is assessed by (i) \(N_D\): number of detected bifurcations with distance less than D from the ground truth locations; (ii) M: mean distance between the ground truth bifurcations and the corresponding closest detected bifurcations.

Branch Extraction Evaluation: We measure how well the detected curvilinear centerlines match the ground truth centerlines by computing \(\mu _D\): the average distance between centerlines [21].

Table 1. Performance of different methods on clinical data with measure M and \(\mu _D\). Distance unit in mm and values shown in format \(mean\pm std\).

3.3 Experimental Results

Advantage of Statistics: To confirm the advantage of incorporating tree statistics, we removed the tree statistics of real data and globally optimized the objective function. In practice, we mimicked a uniform geometrical prior by scaling up the covariance matrix elements by a factor of 20. Rows D and E in Table 1 show that incorporating statistics improves M and \(\mu _d\) by \(42\%\) and \(28\%\), respectively.

Robustness to Noise: To evaluate the robustness of our bifurcation detector to noise, we added three levels of Gaussian noise with SNR\(=[10, 5, 3.3]\) to the noise-free synthetic data. A distance map from the edges of the tree mask is used as the unary term. Table 2 shows that our method is stable even to high level of noise. For example, in columns 3 and 4 of Table 2, SNR is doubled while M is increased about one voxel.

Table 2. Effect of SNR on measure M for synthetic data (\(mean\pm std\)).
Fig. 2.
figure 2

Variation of \(N_D\) on real data for proposed method and Tracker [13].

Advantage of Globally Optimal Model Fitting: We examined the drop in performance when a gradient descent local optimizer approach is used. As expected, the bifurcation localization result is highly sensitive to initialization even if the initialization is close to the ground truth locations. This sensitivity can be attributed to not having a reliable and clean (noise-free) data term, causing the local optimizer to get trapped in local optima. It is also worth noting that our algorithm is linear in both the number of branching points and the number of possible locations for each node. We also compare the proposed method to two competing methods, a tracker based on a bifurcation estimator (Tracker) [13] and the model-based optimally oriented flux method (OOF) [10].

Comparing to Tracker: The root seed point of Tracker is manually set in the trachea trunk. Since the tracker doesn’t have a built-in anatomical tree topology, we match each ground truth bifurcation to its closest one among all the detected bifurcations. Figure 2 reports \(N_D\) as a function of D for the proposed method (blue curve) and Tracker (brown curve) on real data. The two different plateau levels of the two curves illustrate that not all the bifurcations are detected by Tracker. Also, rows A and E in Table 1 illustrate that the proposed method outperforms Tracker, reducing the error by 10% in M and 62% in \(\mu _D\).

Comparing to OOF: To trace the centerlines using the tubularity score of OOF, we had to manually select bifurcations before using the fast marching algorithm to generate the path between those bifurcations. A Naïve comparison of rows B and E in Table 1 shows that OOF outperforms our method by about  3 mm in \(\mu _D\) on real data. However, our proposed method doesn’t need initialization and is fully automatic. So, for a fair comparison, we used the same manually selected bifurcations and the centerline extraction approach in Sect. 2.2. The result is reported in row C. Now, comparing rows B and C shows that, using the same set of bifurcations, the minimal path on distance transform of ANN output (i.e., our approach) outperforms OOF. Moreover, since the variation of \(\mu _D\) for rows B, C and E is less than the average voxel size of our clinical data (0.67*0.67*0.95 mm\(^3\)), these experiments confirm that by detecting bifurcations using the pictorial structure, the tracing algorithm becomes fully automatic while the accuracy of the centerline detection remains practically unchanged.

4 Conclusions

We presented the first global method for extracting tree-like structures from 3D medical images while encoding geometrical tree priors. The global model-to-data fitting made centerline tracing free from any initialization and the incorporation of priors made the method more robust to noise. Incorporating fixed topological priors for consistent branches is advantageous of this paper. In the existence of topological variability, e.g. pathology cases or generations deep down the tree; our method is not designed to handle these cases. Nevertheless, we note that the pictorial algorithm [5] is stable to occlusions, so even when the tree model has a fixed topology, it should still be able to locate actual trees with slight topological variations. Future work will involve integrating the minimal path optimization within the pictorial algorithm and encoding more elaborate branch statistics (e.g. medial curvature and radii). It is also interesting to explore automatic ways to detect pathological deviations from priors that are supported by image evidence, as these may indicate pathology.