# 3D Intervertebral Disc Segmentation from MRI Using Supervoxel-Based CRFs

## Abstract

Segmentation of intervertebral discs from three-dimensional magnetic resonance images is a challenging problem with numerous medical applications. In this paper we describe a fully automated segmentation method based on a conditional random field operating on supervoxels. A mean Dice score of \(90\pm 3\) % was obtained on data provided for the intervertebral disc localisation and segmentation challenge in conjunction with the 3^{rd} MICCAI Workshop & Challenge on Computational Methods and Clinical Applications for Spine Imaging - MICCAI–CSI2015.

## Keywords

Support Vector Machine Intervertebral Disc Conditional Random Field Smoothness Term Large Margin Nearest Neighbour## 1 Introduction

Segmentation of intervertebral discs in three-dimensional (3D) magnetic resonance (MR) images is an important step in many applications, but remains a difficult problem for automated computational methods. In this paper we report a fully automated method and evaluate it on data provided for the intervertebral disc localisation and segmentation challenge in conjunction with the 3^{rd} MICCAI Workshop & Challenge on Computational Methods and Clinical Applications for Spine Imaging - MICCAI–CSI2015. The approach described here is adapted from a method for segmentation of vertebrae from MR imaging and computerised tomography (CT) data that we introduced in the works of Hutt et al. [1, 2]. The method is based on a conditional random field operating on supervoxels and incorporating a classifier and distance metric learned on sparse supervoxel features. Compared to our previous work, the most notable difference is that we do not use any location features for intervertebral disc segmentation. In Sect. 2 we give a brief description of the main components of the method, before presenting the segmentation results.

## 2 Methods

### 2.1 Supervoxels

We formulate the segmentation problem as one of assigning class labels to *supervoxels* (groups of similar voxels). Operating on supervoxels enables more descriptive features to be extracted over the supervoxel regions and greatly reduces computational complexity compared to operating directly on the individual voxels of the images. To generate supervoxels for a volume, we use a modified version of simple linear iterative clustering (SLIC) [3] which results in supervoxels with approximately equal *physical* extent in all directions. We determine the supervoxel parameters empirically by searching for the maximum supervoxel size that still preserves almost all object boundaries in the training images.

### 2.2 Multi-scale Dictionary Learning

We aim to characterise the supervoxels by extracting descriptive features from them which can be used to learn a model from training data to estimate the class label (i.e. disc or background). We next describe our supervoxel features, which are obtained by encoding and pooling the responses from learned multi-scale dictionaries of linear filters.

To learn the dictionaries, we first construct a Gaussian pyramid representation for each of the training volumes by successive smoothing and downsampling by a factor of 2. We then randomly sample 100 000 patches of dimension \(5 \times 5 \times 5\) voxels from each pyramid level of the training images and reshape them into vectors \(\{\mathbf {v}_i\}_{i=1}^M\). The sampled vectors are whitened and then encoded into a separate dictionary of filters corresponding to each pyramid level using *sparse coding* [4]. For the results given in this paper we used 3-level pyramids and learned a separate dictionary of 128 filters at each level of the training pyramids. This results in a set of dictionaries which are able to capture large-scale structure in the volumes due to being learned over multiple scales, but are also very efficient to compute.

^{1}Features corresponding to different levels of the pyramid are concatenated into a single vector at each location. The densely extracted features are then aggregated within each supervoxel using a max pooling operation and \(\ell _2\)-normalised to unit length.

### 2.3 CRF with Learned Potentials

*conditional random field*(CRF) over the supervoxels that relates the features to the underlying class labels. The resulting model promotes spatial consistency of the labels and enables segmentation to be carried out very efficiently using graph cut algorithms.

*i*. The constant \(\lambda \) controls the relative importance of the data and smoothness terms. The data term of the CRF is defined as the negative log likelihood of the supervoxel feature vector given the class label:

Segmentation results on the training dataset. The table shows the Dice score (%) and average absolute surface distance (ASD) (mm) for each individual subject. The final column gives the median value over all subjects.

Subject | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | Med. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Dice | 91 | 93 | 89 | 84 | 93 | 92 | 89 | 92 | 92 | 94 | 87 | 90 | 89 | 90 | 92 | 91 |

ASD | 0.54 | 0.40 | 1.22 | 1.37 | 0.34 | 0.45 | 0.72 | 0.40 | 0.44 | 0.35 | 0.95 | 0.63 | 0.59 | 0.54 | 0.45 | 0.54 |

## 3 Results

The method was evaluated on a dataset consisting of T2-weighted turbo spin echo MR images from 25 different subjects provided for the MICCAI–CSI2015 intervertebral disc localisation and segmentation challenge [6].^{2} Each image contains intervertebral discs of the lower spine from T11 to L5. A total of 7 discs in each image have been manually annotated. The complete dataset is split into an initial training dataset consisting of 15 subjects and two test datasets each consisting of 5 subjects.

Leave-one-out testing was used to evaluate the performance of the method on the training dataset. At each iteration the model was learned on the 14 training subjects and then evaluated on the single held out subject. The process was repeated for all subjects, thus ensuring that the training and test data were always from separate subjects. For each test subject, parameters, such as \(\lambda \), were learned using leave-one-out cross validation on the 14 training subjects; here too, the validation subject was always separate from the remaining 13 training subjects. The average value of \(\lambda \) over all leave-one-out iterations was 0.67, while the average values of the SVM regularisation and kernel parameters were \(C = 5.24\) and \(\gamma = 0.52\). The execution time for processing a single volume after learning was approximately 6 min using an Intel Core i5 2.50 GHz machine with 8 GB of RAM.

The segmentations were evaluated using the Dice score and average absolute surface distance (ASD); the results on the training dataset are summarised in Table 1. The mean Dice score on the training dataset was \(90 \pm 3\) % and the mean ASD was \(0.63 \pm 0.32\) mm. On the two test datasets the mean Dice scores were \(90 \pm 4\) % and \(91\pm 3\) %; the mean ASD values were \(1.24 \pm 0.24\,\text {mm}\) and \(1.19 \pm 0.20\,\text {mm}\). Figure 1 provides a visual comparison, for single slices of the 3D segmentation, between example automatic segmentations and manual annotations.

## 4 Conclusion

We described a method for automated segmentation of intervertebral discs from MR imaging data based on a conditional random field on supervoxels. The method was shown to obtain accurate and consistent results on the challenge data, with each volume taking approximately 6 min to segment. An advantage of our approach is its generality, which means it can be applied to segment different structures without fundamental change to the model. This is illustrated by the consistently good performance of the method on the intervertebral disc dataset, along with previous vertebra segmentation results obtained on MR imaging and CT data.

## Footnotes

- 1.
Separate dictionaries are used to encode patches at different levels of the pyramid.

- 2.
Available from the SpineWeb: http://spineweb.digitalimaginggroup.ca.

## Notes

### Acknowledgements

We are grateful to the organisers of the challenge and to the SpineWeb initiative for making the data available.

## References

- 1.Hutt, H., Everson, R., Meakin, J.: Segmentation of lumbar vertebrae slices from CT images. In: Yao, J., et al. (eds.) CSI 2014. LNCVB, vol. 20, pp. 61–71. Springer, Switzerland (2015)Google Scholar
- 2.Hutt, H., Everson, R., Meakin, J.: 3D segmentation of the lumbar spine from MRI using supervoxel-based CRFs. Technical report, University of Exeter, UK (2015)Google Scholar
- 3.Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell.
**34**(11), 2274–2282 (2012)CrossRefGoogle Scholar - 4.Olshausen, B., Field, D.: Emergence of simple-cell receptive field properties. Nature
**381**(6583), 607–609 (1996)CrossRefGoogle Scholar - 5.Weinberger, K., Saul, L.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res.
**10**, 207–244 (2009)zbMATHGoogle Scholar - 6.Chen, C., Belavy, D., Yu, W., Chu, C., Armbrecht, G., Bansmann, M., Felsenberg, D., Zheng, G.: Localization and segmentation of 3D intervertebral discs in MR images by data driven estimation. IEEE Trans. Med. Imaging
**34**(8), 1719–1729 (2015)CrossRefGoogle Scholar