Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Radiography (e.g. X-ray, CT, fluoroscopy) is the conventional technique for imaging bones, however it involves radiation exposure. Ultrasound (US) has been proposed as a safe, real-time imaging alternative for certain applications such as bone surface localization for diagnosis and routine orthopedic controls, e.g. [14]; and for intra-operative guidance in computer-assisted orthopedic surgery (CAOS), e.g. [5, 6]. Nevertheless, identifying bone surface is a challenging task, since US suffers from a range of different artifacts and presents low signal-to-noise-ratio (SNR) in general. The methods proposed in the literature require manual interaction or complex parametrizations limiting their generalizability.

Although ultrasound raw radio-frequency can be used to segment bones [7], its availability for routine clinical applications from commercial US machines is still quite limited. Considering conventional B-mode imaging, early work focusing on bone surface segmentation utilized intensity and gradient information, e.g. [8]. Hacihaliloglu et al. [1] exploited phase congruency from Kovesi [9] to introduce phase symmetry (PS) in 2D and 3D to identify bone fractures by aggregating log-Gabor filters at different orientations. This enhances bone surface appearance as seen in Fig. 1c. The tedious parameter selection phase of log-Gabor filters for PS was automated later in [10]. Inspired by gradient energy tensor from [11], PS was also used to define local phase tensor (LPT) metric and was studied for enhancing bone surface appearance for registering statistical shape models to 3D US images [5]. Despite its high sensitivity, the major drawback of PS is its low specificity; i.e. it gives false positives at interfaces between soft tissue layers (Fig. 1c). Therefore, most works using PS alone require manual interaction, e.g. selection of a region-of-interest (ROI) around expected bone surface, or post-processing to remove false positives. Note that PS is a hard-decision, giving almost binary (a very high dynamic range) response, from which post-processing may not always recover from, leading to suboptimal solutions. Alternatively, in [3] confidence in phase-symmetry (CPS) was introduced to enhance bone surfaces in US by uniformly weighting PS, attenuation and shadowing features; the latter two stemming from confidence maps [12] based on random walks. The shadowing feature is exemplified in Fig. 1d. These earlier works either lack a principled approach to combine the available information, e.g. image appearance and physical constraints of ultrasound, or rely strongly on PS for bone surface. In this paper, we propose a novel graphical model, which is robust to false-positive responses, by introducing physical constraints of ultrasound-bone interaction combined in a principled way with appearance information from a supervised learning framework.

Fig. 1.
figure 1

(a) An in-vivo bone US. The red line indicates the mid-column, along which the plots show: (b) B-mode intensity, (c) PS [1], (d) shadowing feature [12], (e) shadow and (f) soft-tissue probabilities from the trained appearance model cf. Sect. 2.1.

2 Methods

Despite the fact that soft-tissue interfaces and bone surface may both appear as hyperechoic reflections, there is a fundamental difference at bone surfaces: Due to the relatively higher acoustic impedance of cortical bone, it causes an almost total reflection of transmitted ultrasound energy. This leads to a bright surface appearance and, behind this, a dark or incoherent appearance due to lack of ultrasound penetration. Accordingly, we categorize the US scene in three classes: bone surface (B), shadow behind this surface (S), and other (soft) tissue (T). We model their appearance using supervised learning with the following features.

2.1 Image Features and Learned Appearance Models

2D image patch and 1D image column features are employed, the latter approximating the axial propagation of focused beams. Features regarding statistical, textural, and random walks-based information are extracted at different scales as listed in Table 1, where Scale-space indicates kernel sizes, i.e. the edge length of square kernels or length of vector kernels. A subset of the features for a sample US image is depicted in Fig. 2. Below they are briefly summarized.

Fig. 2.
figure 2

Sample features from a US image (a), where \(\cdot _n\) denotes filter kernel scale from fine “1” to coarse “3”: (b) \(\mathrm {Median_{3}}\), (c) \(\mathrm {Entropy_{2}}\) (d) Attenuation, (e) \(\mathrm {Gauss_{3}}\), (f) Column-long \(\sigma \), (g) \(\mathrm {LBP_{1}}\), (h) Rayleigh \(\mathrm {fit\_error}.\)

Local-patch statistics. Simple and higher-order statistical features are used.

Random-Walks. Features from the literature such as confidence maps (\(m_{x,y}\)) from [12]; and, based on this, attenuation (\(a_{x,y}=\mathrm {norm}\left( \sum _w (m_{x,y}-m_{\min })\right) \)) and shadowing (\(s_{x,y} = \mathrm {norm}\left( \sum _w m_{x,y}/m_{\min }\right) \)) from [3], where \(\mathrm {norm}(.)\) is the unity-based normalization and w is number of pixels in the patch are applied.

Column-wise and integral statistics. Intuitive metrics motivated by the reflection and attenuation effects acting in a cumulative manner as ultrasound propagates are also employed from the far-side of the image to a point.

Local Binary Patterns. In order to capture textural (speckle) information visually, we used well-known Local Binary Patterns [13] and Modified Census Transform [14], which relate the intensity at a point to its neighbors.

Speckle characteristics. A last feature is included from an ultrasound physics perspective: It is known that the appearance of fully-developed speckle can be characterized locally by Rayleigh, Nakagami, or similar distributions. At locations where ultrasound SNR is low, e.g. behind bone surface, although it may be possible to get a high intensity (with high gain, etc.), the content would be mostly other (e.g. electrical) noise, which will follow a Gaussian or uniform distribution. Accordingly, we used the fit of a Rayleigh probability density function (pdf) to patch intensity histograms to quantify its speckle characteristics, i.e.:

$$\begin{aligned} \mathrm {fit\_error} = ||\ \mathrm {pdf_{Rayleigh}} - \mathrm {norm}(\mathrm {hist}(\mathrm {patch}_i)) \ || \end{aligned}$$
(1)

where \(\mathrm {pdf_{Rayleigh}}\) is the maximum likelihood fitted distribution and the second term is the normalized histogram of patch intensities.

Table 1. Features extracted at different kernel space-scales for US transmit wavelength (\(\lambda \)) and pixels (px).

To capture scale-space information, patch-based features are extracted at multiple scales (see Table 1). At a point i, this leads to a feature vector of \(f_i\) of length 47 populated by the above-mentioned features. From these features extracted from all image locations of annotated sample images, two discriminative binary classifiers are then trained to construct independent probability functions \(p(f_i\,|\,\mathrm {label}_i)\) for classes S and T, below and above the annotated bones respectively. For bone surface B, we use phase symmetry PS, converted to a likelihood as \(e^\frac{\mathrm {PS}}{-\sigma _{0}}\). For a given test image, we cast the bone segmentation as a graph labeling problem shown below.

Fig. 3.
figure 3

(a) Unary cost calculation and (b) Pairwise edge connections for (i) 4-connected, (ii) directional 4-connected, (iii) proposed configuration. Horizontal, vertical and jump-edge connections are denoted with H, V and J respectively.

2.2 Encoding Ultrasound Physics on Graph Edges

For spatially consistent results and removing false local responses, Markov Random Fields (MRF) is a common regularization approach. In MRF, the image is represented by a graphical model, where pixels are the nodes and inter-pixel interaction (e.g. regularization) are encoded on the edges. A maximum-a-posteriori solution involves the minimization of a cost function in the following form:

$$\begin{aligned} \sum _{i} \varPsi (i) \ + \ \mu \sum _i \sum _{j \in \mathrm {N}_{i}} \varPsi (i, j) \end{aligned}$$
(2)

where \(\varPsi (\cdot )\) and \(\varPsi (\cdot , \cdot )\) are the unary and pairwise cost functions and N\(_i\) is the neighbourhood of node i. One can then obtain a regularized labeling (segmentation) solution, e.g., using common Potts potential for pairwise regularization and the label models above as unary costs, as seen in Fig. 3(a).

Table 2. Our pairwise cost definition for horizontal (H), vertical (V) and jump (J) edges.

MRF uses undirected edges as in Fig. 3(b.i), and thus can only encode bidirectional information. Regarding ultrasound, we know that it travels axially, thus different types of interaction occur between vertical (V) and horizontal (H) pixel neighbours in the image. Different pairwise costs for such neighbours can be set using a directed factor graph as in Fig. 3(b.ii). For horizontal edges, we use a Potts-like model in Table 2(a), where same labels on both ends are penalized less, with parameters \(k_1\) and \(k_2\) in range \((0,\,1)\) since neighboring pixels shall be more likely to be of the same class. For vertical edges, what we know is following: 1. soft tissue T starts from the skin; 2. once the bone surface B is encountered, the rest of the image (below that location) should be shadow S (no more T); and 3. S cannot start without encountering B first. Constraint 1 above is enforced by a unary constraint on top image pixels (skin), and the latter two are enforced using the vertical pairwise costs in Table 2(b), where \(\infty _{1}\) prohibits transitions that violate these conditions. Consequently, starting from the transducer the encountered labels (downward) should be in this strict order: T \(\rightarrow \) B \(\rightarrow \) S. Thanks to factor graphs, vertical transitions can also be penalized, if desired, differently than horizontal ones, controlled by parameter \(k_3\) (=1 for isotropic penalty).

Reflection of ultrasound at bone surface generates a hyperechoic band, the thickness of which depends on various factors (e.g., ultrasound frequency). Accordingly, after the label switching to bone surface B, it should not continue as B until the bottom of the image, but instead switch to S shortly after. We encode this with an additional so-called jump edge (J) connected from each pixel to the one l pixels below, as in Fig. 3b.iii (green). With the costs given in Table 2(c), this enforces the thickness of surface appearance to be exactly l pixels: \(\infty _{2}\) prohibiting S below T, setting a lower bound of l; and \(\infty _{3}\) prohibiting both ends from being B, setting an upper bound of l. For J, \(\infty _{1}\) still enforces the right order of transmission. We call this novel connectivity and cost definition as bone factor graph (BFG). This is optimized by off-the-shelf tools to obtain segmentation.

3 Results and Discussion

37 US images were acquired using a SonixTouch machine (Ultrasonix, Richmond, Canada) with L14-5 transducer at depths [3, 5] cm with frequencies {6.66,10} MHz (depending on body location). B-mode images had an isotropic pixel resolution of 230 \({{\upmu }}\)m. Collected data include bones in the forearm (radius, ulna), shoulder (acromion, humerus tip), leg (fibula, tibia, malleolus), hip (iliac crest), jaw (mandible, rasmus) and fingers (phalanges). Following [2], bone-surfaces were delineated in the images by an expert at locations where it can be distinguished with certainty; i.e. unannotated columns mean either no bone or not visible.

Fig. 4.
figure 4

(a) Comparison of algorithms (best scores are shown in bold), (b) average accuracy vs. tolerance margin, and (c) F1 score; where we propose \(\mathrm {CFG}_{\uparrow }\)\(\mathrm {BFG}.\)

Fig. 5.
figure 5

Sample qualitative results show robustness (top row) to false detections in soft tissue interfaces; (bottom, left&center) to shadowing and reverberation artifacts inside bone; (top, center) separate bone surfaces, e.g. radius and ulna; (right) images demonstrate typical failures.

We ran 6-fold cross-validation experiments. For learning probability models, L2-regularized logistic regression was used from LIBLINEAR libraryFootnote 1. Factor graphs were implemented using OpenGM libraryFootnote 2. For transmit wavelength \(\lambda \) at a given ultrasound frequency, \(l = 4\lambda \), \(\sigma _{0} = 10^{-3}\), \(\mu = 1\), \(k_{1} = k_{2} = 0.5\) and \(k_{3} = 0.3\) are used for the experiments. We utilized the Sequential Tree-Reweighted Message Passing (TRW-S) algorithm for graph optimization.

To compare BFG with alternatives, we also implemented MRF with edge connectivity in Fig. 3(b.i) and Potts pairwise potentials, and conventional factor graphs (CFG) without the jump-edge potential with edge connectivity in Fig. 3(b.ii) and potentials in Table 2(a, b), and with parameters given above. As these implementations gave arbitrarily poor results for our evaluation metrics due to many false negative in comparison to BFG, we applied the following post-processing steps to improve these alternative methods to a comparable level. We first thinned the result to single pixel using morphological thinning [15]. Subsequently, if there are multiple occurrences of bone detection, only the lowermost pixel is kept to avoid false positives within the soft tissue. We denote these two post-processing steps with (\(\cdot _\uparrow \)). Considering typical state-of-the-art PS methods, most require the selection of a ROI around actual bones, since multiple reflections are extracted. We compared our method with [10], where the highest PS response per column (PS\(_{\mathrm {max}}\)) was proposed as an automatic way of identifying the bone surface. Since this yielded relatively poor results as it was, we also applied (\(\cdot _\uparrow \)) to PS as an alternative technique, which we refer as state-of-the-art. We also compared with confidence-weighted phase symmetry (CPS) from [3]. This similarly yields many false negatives, so we report its post-processed version CPS\(_\uparrow \). For BFG, simply the midpoint of l-thick B was output.

We used common bone-detection evaluation metrics: symmetric Hausdorff distance (sHD), one-way Hausdorff distance (oHD), and RMSE of detected bone surface to the closest gold standard (GS) point. Quantitative results averaged over 6-folds are seen in Fig. 4(a), indicating that our algorithm outperforms other approaches. We also looked at the classification accuracy of surface detections. We considered a detection pixel correct if it is within a tolerance margin around the gold standard annotation. Accordingly, we generated a classification result (i.e. true/false positive/negative) for each column and computed accuracy score over those for the image. In Fig. 4(b), average accuracy of three best methods are seen as the tolerance is changed. The accuracy of BFG is 86 % whereas PS\(_{\uparrow }\) is 65 % at 1 mm; 92 % vs. 72 % at 2 mm; and 95 % vs. 79 % at 4 mm margin, respectively. BFG outperforms the others at all operating points. According to [16], error tolerance in CAOS is 1 mm excluding operator error. Choosing this as an example operating point, we also calculated F1 scores as in Fig. 4(c). This shows the robustness of our method across all test images, compared to alternatives. A qualitative comparison between BFG, \(\mathrm {PS_{\uparrow }}\), and GS is seen in Fig. 5.

A computer with Intel i7 930 @ 2.80 GHz and 8 GB RAM is used for the experiments. Results were computed in 2 min on average with a non-optimized Matlab implementation, where the majority of time is taken by feature extraction; which can be in the future accelerated by parallel computation or feature selection, albeit was not the focus of this paper.

4 Conclusions

In this paper, we have presented a novel graph representation of ultrasound-bone interaction for a robust and fully-automatic segmentation of bone surfaces. Our method performs superior to alternative techniques, demonstrating clinically-relevant performance for a diverse range of anatomical regions. In the future, we will improve its speed for real-time surface detection, e.g. for registration of pre-operative models to real-time US data for navigation and guidance.