Introduction

Breast cancer metastasizes preferentially to bone. Post-mortem evaluation revealed that 70% of patients who died of breast cancer had bone metastases present in the skeleton [1]. Bone metastases cause severe morbidity in living patients such as bone pain, fracture, hypercalcemia, and nerve compression [2, 3]. As a result, quantification of osteolytic lesion size is pivotal in preclinical research of metastatic bone disease and treatment evaluation in small animal models.

Osteolysis is currently quantified using 2D radiographs [4, 5]. The scoring of these radiographs is performed manually by drawing a region-of-interest (ROI) around the lesion and measuring the bone area. The problem with this procedure is that lesions may be projected on top of each other and will therefore be underestimated when quantified, due to the flattening of the 3D structure [6]. The same may happen for lesions on the side of bone. Furthermore, performing the analysis manually is prone to observer bias. MicroCT datasets provide spatial information, suitable for measurements of various bone parameters such as bone volume, bone thickness, and bone mineral density. These measurements are potentially more informative than the radiographic analyses. Also, MicroCT enables the researcher to study the overall bone structure.

The use of MicroCT for quantitative measurements is not without difficulties. The shape and position of a volume-of-interest (VOI, the 3D counterpart of a ROI in 2D) in a 3D dataset greatly influence the measurement results. Therefore, it is crucial that the selection of a VOI is reproducible and not affected by the scan orientation or the observer who performs the procedure. We previously published a manual approach for the normalized selection of a region of interest in complex shapes [6]. This manual approach provides good and reproducible results but is very time-consuming and requires well trained observers.

The comparison of whole-body datasets from longitudinal studies is even more difficult. Variation in posture of the animal during scans taken at different scan dates makes it nearly impossible to spot subtle disease induced differences between scans [9].

We previously published an approach to automatically align the skeletons of animals that were scanned at different points in time. The method can handle large postural differences between animals and as a result, specifically designed holders that are sometimes used to coarsely align animals [10] are not required. In addition, the user can select individual bones and generate side-by-side visualizations of these bones from multiple longitudinal datasets (Fig. 1). Such normalized visualizations greatly facilitate detailed qualitative assessment of structures in multiple complex and large datasets [11].

Fig. 1.
figure 1

An overview of our previously published approach to coarsely locate user-defined structures of interest in follow-up whole-body data [9] and present them in a standardized manner [11]. a The skeleton of an atlas is registered (aligned) to MicroCT data acquired at N time points T0TN. b An example of the registration result for one dataset. c Based on the registration result, we can determine volumes of interest (VOIs) around individual bones. The VOIs are shown as yellow boxes. d Based on the VOIs, the data can be put in a standardized layout using Articulated Planar Reformation (APR) [11]. e The advantage of the standardized layout is that the same structures in datasets from different time points (T0TN) can be visualized side-by-side, greatly facilitating data comparison.

Here we describe an addition to this method, which enables the user to perform automated quantitative measurements of bone volume and thickness alongside the visual output. For evaluation, we applied the method to segment the femur and the tibia/fibula in whole-body follow-up MicroCT datasets and measured the bone volume and cortical thickness at three points in time: baseline, 3 and 7 weeks. To test whether this approach could be used to quantify biologically relevant changes in bone volume, breast cancer cells were injected into the right tibia after the baseline scan. The left tibia remained untreated and served as a reference. The results of the automated measurements are compared with manual measurements of two experts. We show that the automated segmentation and volume measurements perform equally accurate and reproducible as manual segmentation and volume measurements.

In summary, the goals of this work are to:

  • Automate the task of measuring the volume of a user-defined bone in whole-body in vivo MicroCT data and demonstrate the method by measuring the bone volume of the proximal tibia/fibula at several points in time

  • Compare the automated measurements with two human observers and show that the results are not significantly different

  • Present a way to assess the measurement quality visually, by providing proper visualization

  • Present a method to assess effects of osteolysis and bone remodeling locally (site-specific bone loss or gain) by automatically measuring and visualizing cortical bone thickness

Materials and Methods

Animals

Fifteen (n = 15) female nude mice (BALB/c nu/nu, 6 weeks old) were acquired from Charles River (Charles River, L’Arbresle, France), housed in individually ventilated cages, food and water were provided ad libitum. Surgical procedures and MicroCT imaging were performed under injection anesthesia (100 mg/kg ketamine + 12.5 mg/kg xylazine). Animals were sacrificed by cervical dislocation at the end of the experimental period. Animal experiments were approved by the local committee for animal health, ethics and research of Leiden University Medical Center.

Cell Lines and Culture Conditions

The cell line MDA-231-B/Luc+ (hereafter MDA-BO2), a bone-seeking and luciferase-expressing subclone from the human breast cancer MDA-MB-231 [12, 13], was cultured in DMEM (Invitrogen, Carlsbad, CA, USA) containing 4.5 g glucose/l supplemented with 10% fetal calf serum (Lonza, Basel, Switzerland), 100 units/ml penicillin, 50 μg/ml streptomycin (Invitrogen), and 800 μg/ml geneticin/G418 (Invitrogen). The cells were monthly checked for mycoplasma infection by PCR. The cells were donated by G. van der Pluijm (Leiden University Medical Center, Leiden, The Netherlands).

Experimental Setup

MDA-BO2 cells were injected into the right tibiae as described previously [13]. In brief, two holes were drilled through the bone cortex of the right tibia with a 25-gauge needle (25G 5/8, BD MicroFine, Becton Dickinson, Franklin Lakes, NJ, USA) and bone marrow was flushed out. Subsequently, 250,000 MDA-BO2 cells per 10 μl PBS were injected into the right tibiae of the animals. MicroCT scans were made before the tumor cell inoculation (T0) in supine position, 3 weeks after tumor cell inoculation (T1) in prone position, and 7 weeks after tumor cell inoculation (T2) in supine position. The animals were scanned with arbitrary limb position.

MicroCT Data Acquisition

MicroCT scans were made using a SkyScan 1076 MicroCT scanner (SkyScan, Kontich, Belgium) using a source voltage and current set to 50 kV and 200 μA, respectively, with an X-ray source rotation step size of 1.5° over a trajectory of 180°. Reconstructions were made using the nRecon V1.6.2.0 software (SkyScan) with a beam hardening correction set to 10%, a ring artifact correction set to 10, and the dynamic range set to −1,000–4,000 Hounsfield units. The datasets were reconstructed with voxel size 36.5 × 36.5 × 36.5 μm3. Neither cardiac nor respiratory gating was used.

Manual Segmentation of the Tibia/Fibula

To assess the performance of the automated tibia volume measurements, two field experts were asked to segment the proximal part of the right tibia. To be able to use the data at full resolution, this was not based on the whole-body dataset but on a subvolume, corresponding to the right tibia, which was automatically determined following the procedure in Fig. 1. An example of such a subvolume is shown in Fig. 2. Starting with this subvolume, the experts were asked to segment the proximal part of the tibia/fibula, i.e., the part between the knee and the location where tibia and fibula separate. The manual segmentation was performed using a tool that was developed in-house with MeVisLab V1.6 (MeVis Medical Solutions AG, Bremen, Germany) as described earlier [6].

Fig. 2.
figure 2

Example of an automatically determined subvolume, including the right tibia. The bone surface is shown together with the corresponding subvolume.

After segmentation, the number of bone voxels was determined using a threshold value to separate bone from background. To determine the optimum threshold for the in vivo datasets, the tibia of one of the animals was scanned ex vivo with high resolution (9.125 × 9.125 × 9.125 μm3) after the follow-up experiment. Subsequently, the tibial bone volume was measured. To find the optimum threshold, for segmentation of bone from the background in the low-resolution data, the threshold was set such that the volume of the tibia of the same mouse in the low resolution data was the same as the volume of the tibia in the high resolution data. This threshold was kept constant for segmentation of all datasets. The result was a volume dataset with the same size as the initial subvolume with voxels labeled as relevant bone, i.e., the proximal tibia/fibula, and background (including irrelevant bone). Therefore, the bone volume of the proximal tibia/fibula could be determined by multiplying the total amount of bone voxels with the voxel volume, i.e., in our case amount-of-voxels × (36.5 × 36.5 × 36.5)μm3. To be able to assess the quality of the segmentation visually, we provided a surface representation of the manually segmented subvolume. The tibia/fibula bone volume served as the reference for the automated method presented in the next subchapter.

Automated Segmentation of the Tibia/Fibula

An automated method should yield results that are as similar as possible to the results a human observer would obtain. Therefore, it should be designed such that it mimics the manual procedure as much as possible. Just as for the manual segmentation, presented in the previous subchapter, the automated segmentation was based on a subvolume as shown in Fig. 2 and the goal was to segment the proximal part of the tibia/fibula. First, a centerline was determined that runs through the center of the femur, the knee and the center of the tibia, based on the registration of the skeleton atlas to the MicroCT data. To this end, we defined 21 bone center locations (10 in the femur, 11 in the tibia) in the atlas. Subsequently, if the atlas bones are registered to the data (Fig. 1b), these atlas bone center locations are approximately in the bone centers of the femur and the tibia in the MicroCT data (the bone center locations do only have to be defined once for the atlas). Subsequently, a bone centerline was derived using cubic B-spline fitting through the bone centers. Next, the volume was segmented into bone and background using global thresholding with the same threshold as was used for the manual segmentation (see previous subsection). Following the bone centerline from the knee towards the distal part of the tibia, the separation of the tibia and the fibula was determined using a hierarchical clustering technique with single linkage [15] that determined the number of bone clusters at regular spaced locations along the centerline. The Euclidean distance between points was chosen as the dissimilarity measure. The transition from two clusters (tibia and fibula) to one cluster identified the location of bone separation. Figure 3 (right) shows a slice, perpendicular to the centerline, which is close to this point (tibia = large spot, fibula = small spot).

Fig. 3.
figure 3

Demonstration of how the bone thickness D is determined automatically if osteolytic lesions are present. The slices from the MicroCT subvolume that are orthogonal to the centerline, with an overlay of the voxels labeled “bone” (blue net), are shown. Along the bone centerline (orange stars), gray-value profiles are taken in axial direction at evenly spaced locations along the centerline. The location close to the knee (left) and the locations halfway between the knee and the tibia/fibula separation (middle) and close to the tibia/fibula separation (right) are shown. Points on the inner boundaries are indicated by red stars, corresponding points on the outer boundaries by green stars. The black arrows indicate the directions, along which the gray-value profiles for the bone thickness measurement are derived. An example of a profile path is shown in red (middle). The inset shows an example of a gray-value profile in blue and its gradient values in green (dx symbolizes a mathematical derivation). The bone boundaries can be found where the gradients are maximum (red stars in the inset) and the bone thickness D is the distance between the boundaries.

Separation of the tibia/fibula from the femur was done in a slightly different way as compared with the manual procedure because it is very difficult to automatically determine a flat separation plane within the knee. Therefore, we chose to rely on a classifier that automatically separates all voxels labeled as “bone” (i.e., after thresholding) into the two classes “femur” and “tibia/fibula.” The classifier was trained using volumetric (tetrahedral) meshes of the femur and tibia atlas after registration (Fig. 1b). Each node location of the meshes was weighted with a 3D Gaussian probability density function with width h (Parzen kernel density estimation [15]). Subsequently, all individual probability densities were summed up, yielding a bone-dependent posterior probability density value within the entire data volume. A voxel labeled as “bone” can thus be identified as “femur” or “tibia/fibula,” depending on its location in the volume, depending on which of the two classes has the highest posterior probability at that location. The parameter h was optimized using a leave-one-out test, based on the available datasets. Finally, the bone volume of the proximal tibia/fibula could be derived by counting the bone voxels classified as “tibia/fibula” along the centerline, up to the tibia/fibula separation determined before and multiplying the total amount of bone voxels with the voxel volume. To assess the quality of the automated segmentation visually, we provided a surface representation of the result.

Automated Segmentation of the Femur

As a proof of concept that the automated segmentation method can be applied to other skeletal elements besides the tibia as well, we demonstrate an automated segmentation of the femur. The femur is connected proximally to the pelvis and distally to the tibia. Following the procedure given in the “Automated Segmentation of the Tibia/Fibula” section, the tibia was separated from the femur in a first step. Second, volumetric meshes of the atlas femur and the atlas pelvis after skeleton registration were used to derive a 3D posterior probability density function for these bones and to determine the separation of pelvis and femur, following the same procedure as described in the “Automated Segmentation of the Tibia/Fibula” section. The kernel width h was identical to the one used for the separation of the tibia and the femur. To assess the reproducibility of the volume measurements, the volume of the left femur of three animals was measured at all points in time and compared with the volume of the right femur over time. In addition, the bones were segmented manually to assess measurement accuracy. To ensure that the influence of the induced cancer cells had a minimal effect on the femur bone volume, we chose three animals where osteolysis had only slightly progressed over time.

Automated Bone Thickness Measurements and Visualization

Accurate knowledge of local bone thickness enables to follow the progress of osteolysis and bone remodeling over time. Therefore, a method is required to measure bone thickness in 3D and to relate the measurement to the exact location on the bone. Above that, the method should be able to handle severe structural changes over time, induced by osteolysis.

There are mainly two approaches described in the literature to assess bone thickness in volumetric data: volume-based methods and surface (feature)-based methods [14]. These are focusing mainly on measuring trabecular bone and the approaches generally take the entire image domain into account. The advantage is that structures with very different shape can be analyzed. Although the approaches could be used for measuring cortical bone as well, the tube-like shape of long bones enables another approach. Since the registration of the skeleton atlas to the data yields a coarse segmentation of the skeleton, we can map a bone centerline, defined in the atlas femur and tibia, to the femur and tibia in the data. Subsequently, we can employ a technique similar to that presented in Van der Geest et al. [22], where the authors measure the diameter and wall thickness of blood vessels in MRA and CTA, based on slices that are orthogonal to the vessel centerline. The great advantage of relying on a centerline is that it is possible to determine exactly at which locations along the centerline the thickness should be measured. The main difference between analyzing vessels and potentially osteolytic bone is that vessels are continuous structures while bone can be highly fractured and contain holes.

The methods for trabecular thickness measurement generally take the entire image domain into account, which can be very time-consuming especially for large volumes or surfaces with a great amount of vertices. The proposed approach enables to greatly reduce computational burden. Above that, being able to define the thickness measurement based on a centerline allows to sample certain areas more densely than others, yielding more accurate measurements.

To determine the cortical bone thickness of the tibia automatically, we relied on the bone centerline presented in the previous section and the subvolume according to Fig. 2. At regularly spaced locations, following the centerline in distal direction, gray-value profiles were extracted in axial direction, starting from the centerline and progressing outwards. In total, 360 profiles were taken per location, with 1° angle difference between them, thus covering an entire circle, oriented orthogonal to the centerline. Since the centerline lies in an area with low intensity (bone marrow), the gray-value profile will consist of low values at the beginning, high values, when the bone is crossed and again low values outside the bone (muscle tissue). An example of such a profile is given in Fig. 3 (middle). Subsequently, the inner boundary of the bone can be determined, using the highest positive gradient of the profile. Doing this for all 360 profiles yielded 360 points that are located at the inner boundary of the bone. However, since the centerline may not always lie exactly in the center these points are usually not evenly distributed along the boundary. Therefore, we applied an additional resampling step so that the points had a minimum distance of one voxel. Examples of resulting inner boundaries are shown in Fig. 3 (red stars). Next, again gray-value profiles were taken, but this time orthogonal to the inner boundary of the bone, starting inside the bone and progressing outwards. An example path of such a profile is shown as a red line in Fig. 3 (middle). Finally, the bone thickness D could be determined using the highest positive and the highest negative gradient of the profile, demarcating the inner and the outer boundary of the bone. This is demonstrated in the inset in Fig. 3 (middle). Hence, our definition of bone thickness is the distance from the inner boundary to the outer boundary of the cortex, orthogonal to the inner boundary.

The bone thickness measurements can be uniquely related to the location on the bone, where they were derived. To be able to assess the bone thickness locally and still have the anatomical context information available, we present a visualization that is based on a surface representation of all bone in the subvolume (Fig. 2). To each location on the bone surface, we linked the corresponding bone thickness and assigned a value-dependent color. The result is a surface representation of the bone, on which the color indicates the bone thickness.

The automated segmentations and bone thickness measurements and visualizations were performed using Matlab 2010b (The Mathworks, Natick, USA).

Quantitative Analysis of Measurement Results

To assess how similar the results of the automated method and the human experts are, Bland–Altman [16] plots as well as Pearson’s correlation coefficients are presented. To investigate in detail the influence of the time point (i.e., baseline, first, and second follow-up), the bone (i.e., healthy and pathologic), and the observer (i.e., automated, observer 1, and observer 2) on the bone volume measurement, we performed a statistical analysis using a three-way repeated measures analysis of variance (ANOVA) [17], with the bone volume as the dependent variable and observers, bone (i.e., healthy and pathologic), and time point as the independent variables (3 × 2 × 3 levels). A repeated measure design requires the variances of the differences between levels to be equal. Therefore, Mauchly’s sphericity test should be non-significant if we are to assume that the condition of sphericity has been met. If the results of the test indicated that the assumption of sphericity was violated, the degrees of freedom were corrected using Greenhouse–Geisser estimates of sphericity [17]. To identify significant differences between group means for main and interaction effects, a Tukey honest significant difference (HSD) post hoc test was used. Effects were considered to be significant if p < .05. The statistical analysis was performed using Statistica 8.0 (StatSoft, Tulsa, USA).

Results

To be able to assess the accuracy of a manual and an automated segmentation of the proximal tibia/fibula, surface visualizations are generated after the measurements. Examples are shown in Fig. 4.

Fig. 4.
figure 4

Bone surface visualization after manual segmentation of the proximal tibia/fibula (left). Bone surface visualization after automated segmentation of the proximal tibia/fibula (right; blue femur, red proximal tibia/fibula, green distal tibia/fibula). The circles highlight differences between the segmentations.

The results of the correlation tests are shown in the top row of Fig. 5 and the measurement agreements are presented in the bottom row of Fig. 5. To assess possible influence of the time point on the agreement, the data are shown for each time point individually (see legends).

Fig. 5.
figure 5

Correlation between the measurements (in mm3) of the two human observers and the automated method (top row). Obs1 vs. Obs2, Auto vs. Obs2 and Auto vs. Obs1 are shown. The blue line represents a linear best fit, defined by the function in the legend. The Pearson correlation r, based on the data (red), is also shown in the legend. Bland–Altman plots representing the measurement agreement between the two human observers and the automated method (bottom row). The black lines indicate the grand means (line) ±1.96 times the standard deviation (broken line), which are 0.06 ± 0.12, 0.03 ± 0.43 and −0.03 ± 0.44 mm3, respectively. The arrows indicate the measurement with maximum disagreement between the observers. To assess, if the agreement is dependent on the time point when the data was acquired, these are shown in different colors (red circles baseline or T0, black diamonds T1, blue stars T2). Note that the values in the legends are the means ±1 times the standard deviation.

Mauchly’s test indicated a violation of the sphericity assumption and therefore degrees of freedom were corrected using Greenhouse–Geisser estimates of sphericity (see Table 1 in the “Appendix” for details). The results show that there are significant differences in measured bone volume for the main effect Time, F (1.39, 16.73) = 28.80, p < .001, as well as the interaction effects Method × Time, F (1.63, 19.59) = 16.71, p < .001, and Bone × Time, F (1.08, 12.93) = 12.75, p < .05. The Tukey HSD post hoc tests revealed a significant difference in bone volume between T0 and T1 (p < .001) as well as T0 and T2 (p < .001). There was no significant difference between T1 and T2 (p > .05).

For the Bone × Time interaction effect (Fig. 6, top left), relevant significant effects were present for healthy vs. pathologic bone at T2 (p < .001), but not at T0 and T1 (both p > .05). For the Method × Time interaction effect (Fig. 6, top right), relevant significant effects were present for Obs1 vs. Auto and Obs2 vs. Auto at T0 (p < .05 and p < .001) but not for Obs1 vs. Obs2 at T0 (p > .05). Furthermore there were significant effects for Obs1 vs. Auto and Obs2 vs. Auto at T2 (p < .001 and p < .05) but not for Obs1 vs. Obs2 at T2 (p > .05). There were no significant effects at T1.

Fig. 6.
figure 6

Mean bone volume (mm3) over time for the pathologic (Path) and the healthy (Heal) bones, respectively (top row), Bone × Time interaction (left) and bone volume over time for the two human observers (Obs1, Obs2) and the automated method (Auto), Observer × Time interaction (right). The results are based on including all mice. Error bars indicate 95% confidence intervals. Mean bone volume (mm3) and the standard deviation of the healthy (Heal) and pathologic (Path) bones for six different mice (af) over time, averaging the measurements of the automated method and the two human observers (middle and bottom rows).

The results of the comparison of the difference in bone volume between healthy and pathologic bone for six different mice are given in Fig. 6 (middle and bottom rows).

The results of the femur segmentation and subsequent volume measurements are shown in Fig. 7. The average volume of the right and the left femur was 0.89 ± 0.64% when measured manually and 0.83 ± 0.53% when measured automatically. To see if there is a significant difference between the human observer and the automated method, a similar statistical analysis as presented in the “Quantitative Analysis of Measurement Results” section was performed, this time including one human observer instead of two. Mauchly’s test indicated no violation of the sphericity assumption (p > .05). The results show that the main effect method is significant F (1, 2) = 92.894, p < .05, and the mean difference between the automated and the manual method is −2.15 ± 0.75%. This means that the automated method results in lower measured volumes than the manual method.

Fig. 7.
figure 7

Result of the automated (Auto) and manual (Obs1) volume measurement for the right (ri) and left (le) femur for three different mice (ac) over time.

A comparison of the development of the bone thickness over time for a healthy and a pathologic bone are given in Fig. 8 by means of bone surface visualizations, where color indicates the bone thickness.

Fig. 8.
figure 8

Comparison of the bone thickness development over time for a healthy and a pathologic bone. Bone surface representations are shown. The colors indicate the bone thickness at each location on the bone. The bone marrow was partially flushed out of the bone during the intra-osseous inoculation used to induce bone metastases. This partial bone marrow ablation leads to a local increase in bone volume preceding cancer-induced osteolysis [6]. The arrow indicates this local increase in bone thickness around the site of early osteolysis. Note that the measurements at the distal end of the femur and the proximal end of the tibia are not meaningful because at these locations, a substantial amount of trabecular bone is present. However, bone thickness measurements are only meaningful for cortical bone.

Discussion

In this article, we described a fully automated approach to analyze skeletal changes in rodent whole-body MicroCT scans. The automated approach is capable to (1) align scans of the same animal, taken at different time points; (2) automatically segment a subvolume (VOI) in these scans; (3) measure the bone volume; (4) measure cortical thickness; and (5) visualize it by means of assigning thickness-dependent colors. In addition, the user can visually check the segmentation performance using 3D bone surface representations and can generate normalized sections of identical sectioning planes in longitudinal scans for side-by-side comparison.

Conventional analysis of radiographs involves identifying osteolytic lesions manually. The procedure of manually drawing a region of interest is prone to observer bias and small changes in thickness or multiple lesions projected on top of each other are easily overlooked [6]. Manual analysis of MicroCT data is a better alternative, but is very labor intensive [6].

An automated method for MicroCT analysis has several advantages over manual analysis. The risk of non-objectivity and interobserver variability is greatly reduced by minimizing the active manual input of the researcher. Only an automated approach can be purely objective and handle every dataset in exactly the same manner. Additionally, an automated analysis method is much faster than any manual procedure. Thus, by automating the analysis, a relatively larger number of scans can be evaluated, compared to a human observer.

Researchers want to know exactly how quantified data is generated and tend to dislike automated “black-box” approaches. To enable the researcher to check every step along the way, the automated method generates visualizations of the segmented volume. These visualizations can be evaluated after the analysis is complete. The automatic segmentation can be overruled manually or some datasets can be excluded from further analysis. Moreover, the cortical thickness maps enable the researcher to directly pinpoint where structural changes of the cortical bone occurred. This way, the cortical thickness maps help identify areas of interest in the original scan data and in other modalities such as histological sections. The assessment of trabecular bone is not possible with the proposed method because the relatively low resolution of the in vivo data (36.5 × 36.5 × 36.5 μm3) renders measuring the trabecular thickness accurately very difficult [23].

We validated the presented automated method by comparing it to the “best available” method, namely manual bone segmentation and bone volume measurements. Therefore, we acquired datasets of 15 mice (n = 15) with induced bone metastases in the tibia at three points in time. The volume measurement results show that there is an excellent correlation between the human observers and the automated method: r Obs1Obs2 = 0.9996, r AutoObs2 = 0.9939, and r AutoObs1 = 0.9937. The Bland–Altman plots (Fig. 5, bottom row) based on all data indicate excellent agreement among the two human observers (interobserver variability) as well as the observers and the automated method. There is no obvious relation between the difference and the mean. Residual disagreement can therefore be explained by the bias and the deviation, which is very low in all cases, namely 0.59 ± 0.64%, 0.26 ± 2.53%, and −0.33 ± 2.61%, respectively. The residual errors are the result of mainly two factors that may influence the measurement outcome: the registration accuracy, and subsequently the segmentation accuracy, and the chosen threshold to separate bone from the background. The registration accuracy has the largest influence on the result and therefore, improving the accuracy would require a modification of the registration method. Special attention should be paid to the robustness of potential methods with respect to bone resorption. The thresholding procedure also influences the measured volume because both values are inversely related, i.e., if the threshold value increases, the volume decreases and vice versa. We chose a global threshold since the resolution of the in vivo data does not allow reliable segmentation of the trabecular bone [23] but methods including local thresholds may be more accurate, if data resolution increases.

Ideally, the automated measurements are identical to the manual measurements. The ANOVA revealed no significant difference between observers (Method, p = .10). This means that the automated method is performing equally well as the two human observers. However, the low p value indicates that significant interaction effects may be present. It appears that there is some dependency of the performance of the automated method on the time point since the automated method is significantly different from the human observers at T0 and T2. Visual inspection of Fig. 6 (top right) suggests overestimation of the volume at T0 and underestimation of the volume at T2. There is no significant difference at T1. This is supported by the Bland–Altman plots (Fig. 5, bottom row) in which the mean difference in measurement is close to zero at T1. However, these differences are borderline and probably due to the very small variation between the human observers.

The bone volumes of pathologic bones were significantly decreased compared to the healthy bones at T2 (Fig. 6, top left). There are no significant differences at T0 and T1. There are two explanations why there is no volume decrease at earlier time points. Firstly, the bone marrow is partially flushed out of the bone during the intra osseous injection of tumor cells. This partial bone marrow ablation has profound anabolic effects on local bone turnover. Bone formation induced by bone marrow ablation reaches a maximum of 1 week after the intervention. After this initial week, the bone volume normalizes gradually over time as the bone recovers from the procedure, a process that can take weeks [18, 19]. Secondly, starting osteolytic lesions around the tumor create weak areas in the bone. The mechanical stress on other healthy parts of the bone will increase due to these weak areas. Both the anabolic effects due to the partial bone marrow ablation and due to the increased mechanical stress result in a local increase of bone volume alongside osteolytic lesions. Combined, these anabolic and osteolytic processes influence the volume measurements as can be seen in Fig. 6 (middle and bottom rows, a–d and f). The cortical thickness maps provide an excellent tool to see exactly where the volume changes occur in relation to the osteolytic lesion site (Fig. 8).

The presented segmentation method is not restricted to the tibia, but can be applied to any bone of the skeleton in whole-body MicroCT scans, as long as it is contained in the MOBY mouse atlas [9, 11, 20]. We are currently implementing the volume measurements of every segmented skeletal element using the same principle. We segmented the femur as preliminary proof of concept. Several conclusions can be drawn from the results in Fig. 7. The volumes of the right and the left femur are very similar for the manual and the automated measurement, meaning that measuring the femur is highly reproducible. The automated method, however, underestimates the volume compared the manual method. This underestimation is to be expected since the femur included in the MOBY mouse atlas does not include the femoral head and neck. Therefore the segmentation result “cuts” the femoral neck approximately in the middle and the amount of underestimated volume thus corresponds to the volume of the femoral head and part of the femoral neck. Note that this is a systematic error and only leads to inaccurate results if the femoral head and neck are of particular interest within a study. The same type of measurement error may occur for other bones as well, since most of the bones in the MOBY atlas are simplified versions of the real bone shape. However, as is the case for the femur, this should not lead to problems because the error is systematic. In the cases where higher segmentation accuracies are required in a particular part of the bone that is simplified, another animal model with more details could be employed. One should however bear in mind that using simplified bone shapes has the advantage that the influence of, e.g., differences in strain or animal size can be minimized by leaving out the fine details.

The increased radiation dose of MicroCT compared with radiographs has always been a major concern limiting its use in cancer research. This is not a problem anymore as modern MicroCT scanners can perform whole-body scans in less than a minute [21]. The delivered radiation dose during these scans is well below a dose that would affect tumor growth, even during longitudinal follow-up studies [7, 8, 21].

All datasets used in this article have been generated with a standard scanning protocol using the Skyscan 1076 MicroCT. However, the described methods can be performed on any other whole-body MicroCT dataset acquired on a different machine and with a different protocol. Other scans might require an adjustment of threshold values and the initial scan resolution will always be a limiting factor during further analysis.

Finally, we want to stress that the described method is general and can be applied to others species as well. The only prerequisite is that an anatomical skeleton atlas is available for the animal of interest.

Conclusion

We suggested a new MicroCT analysis paradigm based on the combined approach of previously published methods for animal posture correction, normalized visualization of follow-up data, and the quantification and visualizations discussed in this paper. Together, this results in a fast and automated workflow, in which the user can easily compare whole-body MicroCT scans on the whole-body level, zoom in to the level of a single bone or bone segment of choice, and gain qualitative and quantitative data of that segment. The animals can be scanned in any posture. Normalized and interactive side-by-side visualizations of the exact same section of skeletal elements at different time points can be generated from longitudinal scans in which one animal is scanned multiple times over time. The detailed side by side visualizations greatly help the researcher to identify changes in the skeleton. The researcher can then identify and zoom in on the bone or bone segment of interest and automatically generate quantitative volumetric data alongside visualizations of the segmented volume and visualizations of the cortical thickness of that specific skeletal element. This new workflow greatly reduces analysis time, aids the handling of complicated scan data and improves the overall qualitative and quantitative assessment of MicroCT scans. The method was validated by quantification of osteolytic effects over time in the tibia but can easily be adapted to other bones of the skeleton. In addition, the approach can be used for other species as well, given that an animal skeleton atlas exists for that animal.