Introduction

The development and validation of forensic anthropological methods rely on the availability of contemporary, population-specific skeletal collections of considerable size [15]. Examples hereof are skeletal collections such as the Bass, Terry, Hamman Todd, Pretoria Bone and Raymond Dart collection [610]. Each of these collections suffers from its unique blend of selection bias. For instance, the Bass collection holds predominantly white males between the ages of 35 and 85 years [11], while the Pretoria Bone Collection holds a majority of black males between the ages of 30 and 80 years [6]. Additionally, in most skeletal collections, females are underrepresented and the collection’s demographics are not reflective of the larger population [12, 13]. Despite these biases, these skeletal collections serve as modern collections from which biological profile methods are developed.

The vast majority of European countries lack modern skeletal collections [1423]. Many European institutions have a rich collection of archaeological specimens, but secular changes disqualify archaeological material and cemetery populations, in general, as reference populations for forensic purposes [24]. Furthermore, the development of modern representative skeletal collections is hindered by legal, ethical and practical considerations [25].

A possible solution to this problem is provided by the vast amount of data that is routinely generated in the hospital setting by medical imaging techniques, such as computed tomography (CT) scans. Such patient-derived CT scans could form the basis of a virtual skeletal database. Patient-derived, anonymized CT images are available in abundance, and a large, representative skeletal collection could thus be developed in a relatively short period of time. In addition, a virtual skeletal database does not require time-consuming and labour-intensive skeletal processing techniques (e.g. maceration) or physical storage space.

The success of a virtual skeletal database depends on the precise modelling of virtual skeletal elements from the CT images and the accuracy with which virtually modelled skeletal elements represent their dry bone counterparts. Previous studies have investigated the accuracy of virtual bones, using CT images of cadavers or dry bone elements, with promising results [2629]. However, the CT scans in these studies are typically conducted under ideal conditions. For example, protocols were not limited by exposure level (mAs) restrictions and, in the case of dry bone elements, without surrounding soft tissue. Both exposure level (mAs) and the biological composition of the scanned subject influence the quality and noise level of the CT image and therefore theoretically could affect the modelled virtual bone [30]. Conversely, patient-derived CT scans are conducted under varying imaging conditions, such as variations in the acquisition parameters (i.e. exposure level, slice thickness, increment level, reconstruction filter) and in the type/brand of scanner used. These variations may influence the geometric variability of the virtual skeletal elements. Also intra- and inter-observer variation in the segmentation process, which provides the virtual models, may add geometric variability to the virtual model.

Knowledge on the extent to which these sources of variability affect the geometric variability (i.e. precision) of the virtual model serves as a basis for the inclusion and exclusion criteria of CT scans for setting up a virtual skeletal database. Also, this knowledge may facilitate an increase in research using 3D models from patient-derived CT scans for the development of forensic anthropological methods [3135].

Therefore, this study aims to investigate the effects of image segmentation (intra- and inter-observer variability) and varying imaging conditions, such as scanner type with the associated standard patient scanning protocol, slice thickness and exposure level on the geometric variability of 3D virtual models of the human pelvis. The pelvis was specifically selected because of its multifaceted morphology, its relevance in forensic anthropological sex estimation techniques and for its low signal-to-noise ratio in CT scans, which adds complexity to the virtual modelling process.

Materials and methods

The current study consists of several imaging experiments, all using the same embalmed human cadaver (63-year-old male) from the body donation program in the Department of Medical Biology of the Academic Medical Centre (AMC), University of Amsterdam, the Netherlands. The cadaver was scanned multiple times on two comparable CT scanners, a Philips Brilliance 64 (Philips Medical Systems, Best, the Netherlands) and a Siemens Sensation 64 (Siemens Healthineers, Erlangen, Germany), using variations of standard patient imaging protocols. To include variability in the scanning process, including the random effect of Poisson noise, quintuplicate CT scans of the same pelvis were made, without repositioning the pelvis between subsequent scans. In each experiment, only one source of variability was altered at a time, enabling the interpretation of each parameter’s influence compared to the other varying imaging conditions. While the imaging settings of the CT scanner remain constant except for the one variable being evaluated, there are some imaging parameters, such as scanner type, that are always incorporated into the overall variability. Image segmentation of the five CT images yielded five virtually modelled pelves of the same physical pelvis, for each source of variability. A total of five sources of variability were studied, namely intra-observer variability, inter-observer variability, scanner type, slice thickness and exposure level.

Each source of variability was evaluated by comparing two quintuplicate sets of virtual bone models, derived from the CT images.

Creating a virtual pelvis model from CT images

The pelvic bone was segmented out of each CT image using in-house segmentation software [36]. Image segmentation is a semi-automatic procedure that results in a polygon mesh, consisting of thousands of points representing the virtual bone model. Segmentation starts with a threshold-connected region-growing algorithm [37] in which the user can interactively adapt the threshold until an optimal number of bone voxels are selected that gives no or minimal leakage to neighbouring structures. Structures that are missed in this procedure can then be added manually using an on-screen brush. A binary closing algorithm [38] helps to fill residual holes and to close the outline. Next, a Laplacian level-set growth algorithm [37] is used to advance voxels towards the edges of the bone. A distance map is finally used to extract a polygon mesh at the zero-level using a marching cubes algorithm [39].

Sources of variability

The following two scanners with their standard clinical scanning protocols, and variations thereof, were used to study the effects of the various sources of variability:

  1. 1.

    Scanner type: Philips Brilliance 64 (Philips Medical Systems, Best, The Netherlands).

    Standard scanning protocol: 120 kV, 150 mAs, slice thickness 0.9 mm, increment 0.45 mm, reconstruction kernel D.

  2. 2.

    Scanner type: Siemens Sensation 64 (Siemens Healthineers, Erlangen, Germany).

    Standard scanning protocol: 120 kV, 200 mAs, slice thickness 1 mm, increment 1 mm, reconstruction kernel B60f.

The sources of variability, associated experiments and the scanning protocols are listed in Table 1.

Table 1 Sources of variability, the associated experiments, and the scanning protocols used

Geometric variability due to image segmentation

Intra-observer variability

One observer segmented the same set of the quintuplicate CT images twice (Round 1 and Round 2). The CT images were acquired by the Philips Brilliance 64 using its standard patient protocol. A 2-month gap between each segmentation session was used to try eliminate recognition.

Inter-observer variability

Two observers each segmented the same quintuplicate CT images acquired by the Philips Brilliance 64 using its standard patient protocol.

Geometric variability due to varying imaging conditions

Scanner type with their associated standard patient scanning protocol

One observer segmented two sets of quintuplicate CT images that were acquired by the Siemens Sensation 64 and the Philips Brilliance 64, using their standard patient imaging protocols. Note that the differences in the standard patient imaging protocols include exposure level (mAs), slice thickness, increment level and reconstruction filter.

Slice thickness

One observer segmented two sets of quintuplicate CT images that were acquired by the Philips Brilliance 64: one set with standard slice thickness (0.9 mm) (standard for bone scans) and one set with an increased slice thickness (3.0 mm) (standard for abdominal scans).

Exposure level (mAs)

For each scanner type, one observer segmented two sets of quintuplicate CT images acquired with full exposure levels (100%) and with halved (50%) exposure levels.

Quantifying and visualizing geometric variability

The variability in the process of virtual modelling, due to image segmentation and imaging conditions, results in small differences between the points of each set of polygon meshes that make up the virtual model. To quantify the geometric variability between these polygon meshes in each quintuplicate set of models, the standard deviation (SD) for each single-point was calculated.

To calculate the single-point SD value, one of the five polygon meshes served as a reference, while the nearest-neighbour distance to a corresponding point in each of the other four polygon meshes was determined. These four distances were used to calculate the single-point SD value. Since the selection of the reference pelvis model may influence the single-point SD value, this procedure was repeated with every polygon mesh acting as a reference, which yielded five single-point SD values per mesh point location. The five single-point SD values per mesh point were averaged to obtain balanced single-point SD values that were independent of the polygon mesh used as a reference. These balanced single-point SD values were used for further statistical analysis and to visualize regions of high and low variation using colour mapping.

It should be noted that all segmentations result in different polygon meshes, with different numbers of points. This means that there are no real ‘corresponding’ points available to determine SD values. Selecting the nearest-neighbouring points in this study as an alternative to find a distance measure may reduce the variability and may therefore underestimate the SD value per point, especially if the polygon surface is influenced by noise. Since the level-set segmentation growth algorithm features image filtering, and therefore smoothing of each neighbouring polygon surface, this effect is considered to be too small to affect the outcomes of the study.

Statistical analysis

The distributions of SD values obtained from the virtual models, for each source of variability, were compared using a Kolmogorov–Smirnov test. A Kolmogorov–Smirnov test quantifies the difference of the SD distributions by determining the largest distance between the cumulative distributions of the two evaluated sources of variability. The test results in a D-value, which represents the single, largest geometric variability between the cumulative SD distributions of two sets of pelves, and at which SD level this occurs. The D-value ranges between 0 and 1, with 1 indicating a maximal difference and 0 indicating no difference. For the associated p value, a value of <0.05 is used to indicate a significant difference between the distributions.

Geometric variability of the virtual model

Traditional anthropology considers a threshold of 2 mm an acceptable measurement error for linear measurements [26]. Virtual models cannot exceed this level of error in order to be considered suitable replacements or substitutes for skeletal remains. In order to achieve this, the point-to-point distance variation of two polygon mesh points should have a 95% confidence interval (CI) of 2 mm or less. Specifically, the error propagation for difference in distances of two measurements indicates that each single-point should not have an SD value exceeding 0.7 mm.

While 2 mm is the upper limit for point-to-point distance variation for forensic anthropological purposes, other values are important for interpretation. Specifically, a balanced single-point SD value of 0.35, 0.175 and 0.07 mm results in a point-to-point distance variation with a 95% CI of 1, 0.5 and 0.25 mm, respectively.

The precision of the virtual model, per source of variability, was determined by categorizing the fractions of balanced single-point SD values of each polygon mesh. Virtual models with larger regions of high single-point SD values (>0.7 mm) are considered to be less precise.

Practical significance for forensic anthropological purposes

Statistical significance is required to establish differences. However, these differences may occur in locations of limited importance for forensic anthropological purposes. We illustrate the practical significance of our results for forensic anthropological purposes by colour maps. These visualize the modelling variation by colour mapping the balanced single-point SD values of each polygon mesh. When creating the colour map, the first reference mesh was selected for visualization and a balanced SD value was assigned to each mesh point. The computation of the SD values was calculated per source of variability; therefore, there are two colour maps, one per the two sources of variability being compared. The colour thresholds are 0.07, 0.175, 0.35, 0.7 and >0.7 mm.

Results

Statistical analysis

The cumulative distributions of the single-point SD values obtained from the Kolmogorov–Smirnov test are visualized in Figs. 1, 2 and 3. From a statistical point of view, the cumulative distribution per source of variability differs significantly (all p values <0.001). However, the figures demonstrate that the largest differences (D-values) occur at single-point SD values that correspond to a point-to-point distance variation well below the 2 mm (CI = 95%) threshold. The largest D-value (0.284), associated with the scanner type with their standard patient scanning protocols, occurs at an SD value that corresponds to a point-to-point distance variation of less than 0.5 mm (CI = 95%). The smallest D-value (0.013) is associated with the difference in exposure levels (100 vs 50%) for the Siemens Sensation 64 and occurs at an SD value that corresponds to a point-to-point distance variation of less than 0.25 mm (CI = 95%).

Fig. 1
figure 1

Difference in the cumulative distribution of single-point SD values, due to intra- and inter-observer variability. Visible in red is the location of the largest distance between the two cumulative distributions. The distances D = 0.107 and D = 0.067 correspond with a point-to-point distance variation of less than 0.25 mm (CI = 95%)

Fig. 2
figure 2

Difference in the cumulative distribution of single-point SD values, due to scanner type with their associated standard patient scanning protocol, and slice thickness variability. Visible in red is the location of the largest distance between the two cumulative distributions. The distances D = 0.2084 and D = 0.096 correspond with a point-to-point distance variation of less than 1 and 0.25 mm (CI = 95%), respectively

Fig. 3
figure 3

Difference in the cumulative distribution of single-point SD values, due to exposure level variability. Visible in red is the location of the largest distance between the two cumulative distributions. The distances D = 0.137 and D = 0.013 correspond with a point-to-point distance variation of less than 0.25 mm (CI = 95%)

Geometric variability of the virtual model

For all sources of variability, more than 97% of the single-point SD values lie below the threshold of 0.7 mm and thus result in point-to-point distance variation of 2 mm (CI = 95%) or less (Figs. 1, 2 and 3, Table 2). Additionally, more than 91% of the single-point SD values lie below the 0.35 mm threshold, resulting in point-to-point distance variations of less than 1 mm (CI = 95%), which is well below the accepted error level of 2 mm in traditional forensic anthropology.

Table 2 Fractions of SD values (mm) that fall below each single-point SD threshold, per source of variability

On average, half of the polygon mesh points have a point-to-point distance variation of 0.25 mm (CI = 95%) or less. This is with exception of CT images generated with the Siemens Sensation 64 scanner and its standard patient scanning protocol, where this only holds true for 30% of the mesh points. As expected, as the point-to-point distance variation decreases, the fraction of polygon mesh points within that increment also decreases.

Practical significance for forensic anthropological purposes

Figures 4, 5 and 6 and supplemental material 112 display the distribution of the single-point SD values, per source of variability, on a reference pelvis. Each colour map displays a similar pattern. Joint surfaces (i.e. the sacro-iliac joint and the pubic symphysis), the posterior surface of the sacrum, the tip of the coccyx and areas typically associated with osteophytes have single-point SD values larger than 0.7 mm. Moreover, colour maps created from CT images acquired by the Siemens Sensation 64 scanner show a slightly larger region with single-point SD values larger than 0.7 mm. These areas are specifically in the acetabulum and the anterior surface of the sacrum, as visualized in Fig. 6.

Fig. 4
figure 4

Colour maps showing geometric variability due to intra-observer variability and inter-observer variability. Single-point SD values of 0.07, 0.175, and 0.35 mm result with a point-to-point distance variations of 0.25, 0.5, and 1 mm (CI = 95%), respectively. The maps were obtained by segmenting quintuplicate CT scans of the pelvis from different scans and by quantifying the variability in point positions along the pelvic surface

Fig. 5
figure 5

Colour maps showing geometric variability due to slice thickness. Single-point SD values of 0.07, 0.175, and 0.35 mm result with a point-to-point distance variations of 0.25, 0.5, and 1 mm (CI = 95%), respectively. These maps were obtained by segmenting quintuplicate CT scans of the pelvis from different scans (one observer) and by quantifying the variability in point positions along the pelvic surface

Fig. 6
figure 6

Colour maps showing geometric variability due to scanner type with their associated standard patient scanning protocol and exposure levels. Single-point SD values of 0.07, 0.175, and 0.35 mm result with a point-to-point distance variations of 0.25, 0.5, and 1 mm (CI = 95%), respectively. These maps were obtained by segmenting quintuplicate CT scans of the pelvis from different scans (one observer) and by quantifying the variability in point positions along the pelvic surface

Discussion

Our results show that despite image segmentation and varying imaging conditions, all virtual models were sufficiently precise in most surface regions of the pelvis. At almost all (>97%) locations across the pelvis, the point-to-point distance variation is less than 2 mm (CI = 95%). Additionally, in more than 91% of the locations, the point-to-point distance variation is less than 1 mm (CI = 95%). This indicates that the geometric variability of the virtual pelvis as a result of segmentation and imaging conditions rarely exceeds the generally accepted point-to-point threshold of 2 mm [26]. Importantly, human error in segmentation is likely playing more of a role in imprecision than the varying imaging conditions.

Virtual models created from CT images with full exposure levels or minimal slice thickness produce virtual bone models that are more precise, as seen by the higher fraction of lower single-point SD values. Lower exposure levels result in higher noise levels and lower image quality, which influences the precision of the image segmentation process and ultimately yields larger variability in specific regions after virtual bone modelling. However, the difference between these virtual models and those produced with halved exposure levels or increased slice thickness are below the above-mentioned 2 mm threshold and are therefore considered negligible.

The variability incorporated in this study is believed to be representative of a worst-case scenario, both for biological and radiographic aspects. For example, pelvic scans are hampered by image noise due to a relatively large amount of attenuation by surrounding soft tissue. Additionally, the cadaver used in this study was of an older individual and subsequently probably had a lower bone density and more osteophytes than a young- or middle-aged individual. Considering all of the above, the authors agree that images generated of other skeletal elements as part of standard clinical CT scanning will likely yield virtual bone models with a higher precision than that presented in this study.

The use of only one pelvis in this study could be perceived as a limitation; however, by only incorporating one pelvis into the research design, additional sources of error, which may result from variation in body composition, were excluded. This allowed for unobscured analysis of the geometrical precision of the virtual bone model, which was the aim of this study.

Surface modelling rather than volume rendering techniques were applied in this study. This technique allows for the conversion of volume data into polygon mesh points that accurately represent the anatomical surface of an object [40]. By quantifying the error in the polygon mesh points, a foundation was laid for researchers to incorporate different and innovative techniques/methodologies to continually enhance the field of forensic anthropology. For example, the lack of measurement error in the polygon mesh points permits the possibility to conduct shape-fitting analyses and automate measurements on patient data. Shape fitting is considered less sensitive to small modelling variations (intra-/inter-observer, noise), which ultimately increases the distinctive power of detecting morphological features [41].

Application to forensic anthropology

The findings from this research add to the current shift in forensic anthropology towards using virtual skeletal databases and virtual methods. ‘Virtual anthropology’ might facilitate an increased understanding and appreciation of the range of human variation than possible with traditional skeletal collections and traditional methodologies, such as classic osteometric parameters.

This study does not specifically and directly test the classic osteometric landmarks, namely the anterior superior- and posterior superior-iliac spine, the superior rim on the pubic symphysis and the inferior margin of the ischium [42]. However, by studying the variability over the entire pelvis on a point-by-point basis, the variability at those classic landmarks was inherently tested. Our study proves that at anatomical areas related to the classic osteometric landmarks, the variability on a point-by-point basis is such that it would not result in an error in linear distance of more than 2 mm between landmarks. This is because at each point, the variability is 0.7 mm or less on a 2 standard deviation (SD) level.

Virtual models therefore enable us to appreciate more complex patterns of variation, as we are not limited to the conventionally used point-to-point distances between anatomical landmarks, more commonly referred to as inter-distance landmarks (ILDs). The use of non-standard methodological approaches enables us to explore the range and pattern of human variation from a different angle and ultimately may lead to the development of innovative osteometric techniques. The benefit of this approach is illustrated by recent work which showed that integrating non-standard measurements provides more information for complex population structures where there is a high rate of immigration, migration and relaxed border controls [43]. Additionally, virtual models derived from clinical CT data enable us to develop new forensic anthropological techniques that are not based on biased skeletal databases.

Conclusion

Virtual bone models segmented from CT images with full exposure levels or minimal slice thickness produce geometries that are more precise. However, the effects of image segmentation and varying imaging conditions have no practical effect on the use of 3D virtual models, of the human pelvis, from a forensic anthropological point of view. Therefore, virtually modelled pelves from segmented patient-derived CT scans are a sufficiently precise source for forensic anthropological methods and for creating a modern virtual skeletal database.