Deformable appearance pyramids for anatomy representation, landmark detection and pathology classification

Purpose Representation of anatomy appearance is one of the key problems in medical image analysis. An appearance model represents the anatomies with parametric forms, which are then vectorised for prior learning, segmentation and classification tasks. Methods We propose a part-based parametric appearance model we refer to as a deformable appearance pyramid (DAP). The parts are delineated by multi-scale local feature pyramids extracted from an image pyramid. Each anatomy is represented by an appearance pyramid, with the variability within a population approximated by local translations of the multi-scale parts and linear appearance variations in the assembly of the parts. We introduce DAPs built on two types of image pyramids, namely Gaussian and wavelet pyramids, and present two approaches to model the prior and fit the model, one explicitly using a subspace Lucas–Kanade algorithm and the other implicitly using the supervised descent method (SDM). Results We validate the performance of the DAP instances with difference configurations on the problem of lumbar spinal stenosis for localising the landmarks and classifying the pathologies. We also compare them with classic methods such as active shape models, active appearance models and constrained local models. Experimental results show that the DAP built on wavelet pyramids and fitted with SDM gives the best results in both landmark localisation and classification. Conclusion A new appearance model is introduced with several configurations presented and evaluated. The DAPs can be readily applied for other clinical problems for the tasks of prior learning, landmark detection and pathology classification.


Introduction
Object class representation is of vital importance for medical image analysis tasks such as localising anatomical features and classifying pathological conditions. Parametric representation of an object category allows the leveraging of the prior knowledge by learning the statistics of the parameters in the population. The representations are often vectorised and used as inputs for training a classifier (Fig. 1a). The training data usually consist of instances with landmarks annotated at consistent anatomical features. The appearance correspondence across the instances is built by aligning a deformable appearance (e.g. active appearance model (AAM) [3]) or extracting local features at the landmarks [1,8,16]. During testing, the landmarks are detected in new, unseen instances, and the features are extracted and sent to the classifier for pathology classification. For a robust landmark detection, a prior model of the object class is learnt by formulating the statistics of the parameters, and the searching is conducted under the regularisation of the prior model. The deformable model is either holistic [3], which consists of the shape and aligned appearance, or part based [1,8,11,16], which represents an object by locally rigid parts with a shape capturing the spatial relationships among parts. In deformable part models (DPMs), the fitting process is implemented by local feature searching followed by a regularisation imposed through a prior model of the global shape. Various types of DPM instances have been proposed utilising advanced feature detection algorithms such as boosted regression [5], random forests [8], regularised meanshift [11], and shape optimisation methods such as pictorial structures [1] and nonparametric models [16]. However, less attention has been paid to optimising the appearance representation and preserving the anatomical details in medical imaging.
In this paper, we introduce a new appearance model referred to as deformable appearance pyramids (DAPs). The object appearance is delineated by an appearance pyramid (AP), which is a multi-scale part-based representation built on the image pyramid, see Fig. 1b. The deformation is approximated by the translations of the parts as well as the linear appearance variations in the assembly of the parts. The multi-scale delineation preserves the details of the anatomical features at high resolution, while captures the background information at lower resolution. We present and evaluate the DAPs built on two types of image pyramids, namely Gaussian and wavelet pyramids, and introduce two methods to model the prior and fit to new instances, one explicitly using a multivariate Gaussian model and subspace Lucas-Kanade (LK) algorithm [2], another implicitly using supervised descent method (SDM) [16].
We apply the DAPs to the problem of lumbar spinal stenosis (LSS) for fitting the landmarks and grading the central and foraminal stenosis [7,14]. The performances of the DAPs with various configurations are evaluated and compared with classic methods such as active shape models (ASMs), AAMs [3] and constrained local models (CLMs) [4]. Experimental results show that the DAPs built on wavelet image pyramids [18] and driven by the SDM give the best performance on both landmark detection and pathology classification.

Deformable appearance pyramids for object representation
Objects belonging to the same class (e.g. same anatomy from different cases) often share similar appearances. The appearances can be represented by a deformable model, which is fitted to individual cases by changing the parameters of the model. With the deformable appearance model, the variations in the population caused by the diversity of individual cases or the pathological degenerations can be parametrised, learned and used as prior knowledge for robust fitting and classification. A DAP is a deformable model representing the anatomies by multi-scale rigid parts as well as the geometrical configuration. It models the variability within a population with local translations and linear appearance variations in the assembly of the parts.

Local feature pyramid
We begin by describing the parts at multiple scales. The part at a landmark is typically described by an image patch with a certain size. Choice of the patch size can significantly affect the performance of the model. For sharper local structures, a smaller patch can give more precise pixel location. At blurry structures, however, the patch size should be large enough to cover distinguishable textural information. A good feature descriptor is expected to have a high spatial specificity (pixel location) while maintaining good distinctive ability (textural properties). Due to the uncertainty principle in signal processing [15], a single scale patch cannot achieve both. We therefore represent the part with a multi-scale local feature pyramid (LFP), with the smaller scales containing local high frequency features, and the larger scales low frequency components.
A L-level LFP at a landmark, denoted by {A l } L l=1 , is an assembly of patches extract from a L-level image pyramid. The patches A l describe the local features with increasing scales and decreasing resolutions in octave intervals. The first-level patch is the smallest one with the finest resolution. A patch in the lth level has l octaves larger scale and lower resolution, which keeps the same size in pixel across all levels, see an example extracted from Gaussian image pyramid in Fig. 2.

Anatomy decomposition by DAP
A DAP is a part-based deformable model with each part delineated by a LFP. The DAP consists of two components: {A, s}, with A = {{A n,l } L l=1 } N n=1 , called an appearance pyramid, being the assembly of the LFPs, and s the geometrical configuration accounting for the deformations. N is the total number of landmarks.
As the patches cover larger anatomical regions at lowerresolution pyramidal layers, fewer number of patches are required to describe the appearance of the anatomy at a coarser level. We trim the patches at these levels preserving only those denoting key features. In practice, a simple trimming algorithm can be designed to iteratively delete the patches which have least distance from their neighbourhood patches until a distance criterion is satisfied. Denoting K n as the subset of scale indices preserved at the nth landmark, the AP becomes A = {{A n,l } l∈K n } N n=1 . At each level of A, the patches describe the anatomy with a certain degree of detail, and together, they give a multi-scale description, see Fig. 1b.
Various types of image pyramids can be used to build an AP for appearance delineation. To be able to preserve the full information of the anatomy and reconstruct the appear- (c) All patches are concatenated and flattened into a 1D vector A serving as the profile of the appearance. (d) A further feature extraction function can be used to enhance the robustness. Reconstruction (e) feature patches are padded at each scale level with the geometry configured by s. All scales are accumulated to recover the object appearance ance, they are chosen to be either pyramids with redundant channels such as Gaussian pyramids or with complementary channels such as wavelet pyramids: we refer to the appearance delineations as Gaussian appearance pyramids and wavelet appearance pyramids, respectively. We briefly illustrate a recent method of wavelet pyramid decomposition in Fig. 3. A detailed introduction can be found in [18]. 1

Deformable appearance pyramids fitting
Fitting a DAP to a new case is accomplished by searching for the landmarks based on local features and matching the model correctly to the geometry and appearance of the object. The geometrical configuration of a DAP defines how the parts relate to each other and the prior knowledge constrains the shape to be plausible in an object category. As a result, the choices of prior modelling and geometry constraint are important. We describe two strategies, one which learns the prior knowledge with explicit methods and the other implicitly.

Explicit model
In the explicit method, the geometry is configured with the point distribution shape model. The shape is represented by in which x n is the coordinate of the nth landmark. We follow the two-step fitting strategy commonly used in part-based models [11,12], i.e. local feature searching followed by a geometrical regularisation. The local feature searching gives predictions of the landmark locations, while the shape prior regularises the geometry within plausible variations. The likelihood of a shape instance with respect to the shape prior and local landmark predictions can be calculated by, Footnote 1 continued and wavelet appearance pyramid at http://sites.google.com/site/ waveletappearancepyramids/.
We show how the prior of the patch appearance A is learnt and used for the local feature searching, and the prior of the geometry s is learnt for the shape regularisation.

Local feature searching
Appearance prior Given the training set, we can extract A from each image and obtain a set of training samples {A 1 , A 2 , . . .}. By extracting the local features from the corresponding landmarks, the shape variation in the training set is removed and a better pixel-to-pixel correspondence achieved; therefore, A can be viewed as 'shape-free' appearances. To learn the statistics of the appearances, we normalise the mean and variance of each A and apply principal component analysis (PCA). The eigenvectors accounting for the significant variations in the training samples form a matrix P A , which spans an eigenspace.
A new instance can be represented in the eigenspace by in whichĀ is the average appearance and b A is the appearance parameters in the eigenspace, obtained by the projection, Searching We derive a subspace LK algorithm [2] for the DAP fitting. In a standard LK method, the searching can be expressed bŷ which attempts to find the location minimising the difference between the local appearance and the templateĀ n,l . A n,l is the patch at the ith landmark and the lth scale in A.Ā n,l is a patch inĀ.x n,l is the predicted location of the ith landmark inferred from A n,l . The standard LK method assumes the difference between the template and the local feature is caused by the misalignment, and aims to minimise the difference by adjusting the location. However, the difference can also be the appearance variations among cases, which makes the searching challenging. As the salient variations have been learnt and represented in the eigenspace spanned by P A , we project the AP onto its orthogonal subspace where these variations are excluded, namely where I is an identity matrix. The objective function thus becomeŝ in which A ⊥ n,l denotes a patch in A ⊥ . In this way, the salient appearance variations have been removed and a more robust LK method achieved. Equation (6) is solved iteratively by the inverse gradient descent method [17] Suppose we also have the variance σ 2 n,l of the prediction x n,l , which could indicate the salience of the local feature or the confidence of the prediction. To keep it simple, we calculate the variance as the mean squared difference between the patch observation and the template. Using a Gaussian parametric form, the likelihood of the location of the ith landmark given the multi-scale prediction can be represented by

Shape regularisation
Shape prior Assuming a multi-variant Gaussian model, the statistics of the shapes is built by applying PCA to the aligned training shapes, where P s ∈ R 2N ×t is the eigenvectors matrix corresponding to the first t largest eigenvalues λ 1 , . . . , λ t and spans a tdimensional eigenspace. b s ∈ R t×1 is the shape parameters in the eigenspace. The probability of a shape instance being plausible in the eigenspace can be calculated by the density estimation [10], in which Λ = diag{λ 1 , . . . , λ t }.

Implicit model
In the implicit model, we deduce the true shape s * from the observation at an initial shape A(s (0) ), which is solving the regression problem, A(s (0) ) → s * . With SDM algorithm, it can be decomposed into a set of regressors and fitted recursively, Each regressor is modelled linearly by, The parameters {R (i) , b (i) } can be learnt from the training images. Specifically, at the ith iteration, the parameters can be learnt by minimising the residual error of regression in the training set, arg min in which M is the number of training samples. s (i) k is the difference between the current shape s (i) and the true shape s * k of the kth training data. In all cases, the initial shape s (0) for the first regressor is set as the average shape at the average location in the training dataset. The shape samples for training the subsequent regressors are generated by applying the previous regressor, In practice, to suppress the over-fitting problem in these situations with high-dimensional features and inadequate training data, a L2 regularisation is applied and the objective function (16) becomes arg min where λ controls the extent of regularisation. Note that in the implicit model the shape prior is in a nonparametric form and is integrated in the training of the regressors. More details of SDM can be found at Xiong and Torre [16].
To reduce the dimensionality of the descriptors and enhance the fitting performance, instead of using intensity features, a more robust feature descriptor such as histogram of oriented gradients (HOG) [6] can be readily applied on the patches. Denoting h(·) as the feature extraction function, the fitting process can be expressed by with the parameters {R (i) , b (i) } learnt in the training data by arg min

Appearance reconstruction, pathology modelling and classification
In the testing stage, the shape of an new object is fitted using the methods presented above. As the pyramidal channels are either redundant or complementary, we can recover the appearance of the object from the DAP. In other words, the objects can be represented compactly by the DAP parameters. Specifically, the shape parameters b s can be calculated by (9) and the appearance parameters b A by (3). For the classification tasks, the correspondence of anatomical features should be built such that the differences among the descriptors account for the true variations rather than the misalignment. In a DAP, the appearance correspondence is built by extracting local features at corresponding landmarks. A classifier predicts the label given an anatomical observation i.e. = arg max p( |Φ). The most significant variations in the training data {Φ} can be learned by a further PCA and the dimensionality reduced by preserving the significant components, which span a feature space P Φ . A DAP therefore can be represented in the feature space by a compact set of parameters b Φ , i.e. b Φ = P T Φ (Φ −Φ), in which Φ is the mean of {Φ}. Using b Φ as inputs the classifier now predicts = arg max p( |b Φ ). We train the classifier using AdaBoost with 100 learning cycles, with decision trees as the weak learners.

Experiments
We apply the DAPs on the problem LSS for localising the feature landmarks and making pathological classification. LSS is a common disorder of the spine. The disorder can be observed in radiological studies as morphological abnormalities. Intervertebral disc-level axial images in MRI scans can provide rich information revealing the condition of important anatomies such as the disc, central canal, neural foramen and facet. In most cases, the original axial scans are not aligned to the disc planes caused by the curvature of the spine. To obtain the precise intervertebral views, we locate the disc planes in the sagittal scans (red line in Fig. 4) and map the geometry to the axial scans to calculate the coordinates, where the voxels   Fig. 4b, conditions of the posterior disc margins (red line) and the posterior spinal canal (cyan line) are typically inspected for the diagnosis. Degeneration of these structures can constrict the spinal canal (pink area) and the neural foramen (yellow area) causing central and foraminal stenosis.
The dataset for validation consists of T2-weighted MRI axial images of 200 patients with varied LSS symptoms. The L3/4, L4/5 and L5/S1 disc-level axial images are extracted, through which we obtain three sets of 200 axial images, 600 images in total. Due to the difference in resolution, all images are resampled to have a pixel space of 0.5 mm. Each image is inspected and labelled with respect to the conditions of central stenosis and foraminal stenosis, respectively. The anatomy is annotated with 37 landmarks outlining the disc, central canal and facet. We evaluate the performances of DAP with two choices of image appearances, i.e. Gaussian versus wavelets, and two choices of fitting methods, i.e. subspace LK versus SDM. We also compare them with three popular models: AAM [3,9] as a standard appearance model, ASM as a widely used shape model, and CLM [4] as a part-based approach.

Results of landmark detection
For landmark detection, we evaluate the performance of DAPs with three configurations: Gaussian appearance pyramid with subspace LK as the fitting algorithm, Gaussian appearance pyramid with SDM and wavelet appearance pyra-mid with SDM. To cover richer pathological variations, we perform the landmark detection on the mixed dataset containing all 600 images. We randomly choose 300 images for training and detect the landmarks on the remaining 300. Two metrics are used for the evaluation: the point-to-boundary distance (PtoBD) and the dice similarity coefficients (DSC) of the canal and disc contours. PtoBD calculates the distance of the fitted landmarks to the ground truth contour, which is more accurate over point-to-point distance. DSC is defined as the amount of the intersection between a fitted shape and the ground truth, DSC = 2 · t p/(2 · t p + f p + f n), with t p, f p and f n denoting the true positive, false positive and false negative values, respectively. It considers both the sensitivity and specificity. The mean results of the methods compared are shown in Table 1. We can see that the DAPs with all three configurations outperform the other methods by a favourable margin. In addition, the comparison of the three DAP instances shows that the implicit model with SDM as the fitting algorithm gives better results than the explicit model with subspace LK as the fitting algorithm. Delineating the objects with wavelet appearance pyramids shows further improvement giving the best performance. Several qualitative results by the DAP with wavelet pyramids and SDM fitting algorithm are shown in Fig. 5.

Results of anatomical classification
After the landmarks are detected, the DAPs are extracted from the image and used as input for classification. As the SDM algorithm detects the landmarks with higher precision  The best results are highlighted in bold compared with a subspace LK method, we use the landmark locations by SDM in the classification tasks and evaluate the accuracy by Gaussian appearance pyramids and wavelet appearance pyramids. For central stenosis, in each of the three subsets, the morphology of the central canal is inspected and labelled with three grades: normal, moderate and severe. For illustration, the average appearances of these classes delineated by the wavelet DAP are shown in Fig. 6a. We randomly pick 100 samples to train the classifier and test on the remaining 100 and repeat for 100 times for an unbiased result. The DAP extracted from the detected landmarks are projected onto the feature space and represented by a compact set of parameters (Fig. 5, bottom), which are used as inputs of the classifier. The performance of normal/abnormal clas-sification is measured by accuracy, which is calculated as (t p + tn)/(t p + tn + f p + f n). The grading errors are measured with mean absolute errors (MAE) and root mean squared errors (RMSE). We compare the performance of DAPs against approaches using other models as inputs to the same classifier. The agreements of the results with manual inspection are reported in Table 2. We can see that the Gaussian DAP gives better or competitive performance in the classification and grading of the central stenosis, while the wavelet DAP outperforms the methods compared by a large margin. Similarly, we perform another normal/abnormal classification on the morphology of the neural foremen. The average appearances delineated by the wavelet DAP are given in Fig. 6b. The classification accuracy of methods compared is reported in Table 3. The result shows that the Gaussian The best results are highlighted in bold DAP gives better performance compared with the popular shape and appearance models. The wavelet version of the DAP enables a further improvement. We believe that the DAP models benefit from its better local feature description and appearance delineation. The further improvement is brought on by the superior properties of wavelets, namely that they are complementary which preserves the full information of discriminating local appearance, and they decompose complex textures into simpler feature components.

Conclusion
We presented a multi-scale deformable part model we refer to as a DAP. Several configurations of the DAP are introduced and evaluated, including two forms of pyramids, namely Gaussian pyramid and wavelet pyramid, and two fitting methods namely subspace LK and SDM. The models are applied on the problem of LSS for detecting the landmarks and classifying the pathologies. As the anatomies of cases at varied degree of degeneration are modelled and represented by the same compact parameters and the appearances can be reconstructed by the DAP models, suggested further work includes the combination of DAP and manifold learning methods such as anisotropic statistic modelling [13] to learn and visualise the pathological progress, by learning the most probable paths in the subspace. The DAPs can easily be applied to other anatomical area for clinical use where segmentation and classification are needed.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Ethical approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent Informed consent was obtained from all individual participants included in the study.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix: Derivation of the ML shape
The maximum likelihood shape is the one minimising the energy function, We first rewrite it in a compact matrix form. To do so, we add to the equation a summation of zero terms, withx n,l assigned with zero values and σ 2 n,l set to be infinite. C n is the relative complement of n in the number set {1, 2, . . . , L}, indicating the missing levels at the nth landmark. The zero terms represent the estimations at landmarks of the trimmed patches in an DAP. The infinite variance value allows the landmark to lie anywhere.
With the zero terms, the energy function becomes which can be rewritten in a matrix form, where Λ = diag([λ 1 , . . . , λ t ]) and Σ l = diag([σ 2 1,l , . . . , σ 2 N ,l ]), b s is the vector of shape parameters and s is the shape. Equation (24) has the typical form of an energy function for shape regularisation, with the difference that the second term is a summation of multiple predictions. The solution is