Introduction

Determining the identity of unidentified remains is crucial in forensic investigations, especially while analyzing severely mutilated and skeletal remains [1]. Population affinity estimation is one of the four basic steps of creating a biological profile and determining sex, age, and stature [2]. Although estimating population affinity is challenging [3], it can narrow down the list of missing persons by comparing skeletal remains with dental records, medical records, and other data [4].

Previous research has documented skeletal morphological differences between and within individuals from global populations. In terms of ancestral variances specifically, the skeletal region of primary consideration is the skull, especially the midface [5,6,7,8,9,10,11]. The pelvis, however, also has unique heterogeneous characteristics and is one of the most diagnostic bones for anthropological profiling, particularly in the estimation of age and sex [12,13,14,15,16,17,18]. The forensic utility of the pelvis is also inherently related to postmortem survivability associated with large musculature that affords some degree of protection to the os coxae and proximal femora; accordingly, morphometric data from those bones can commonly be collected by forensic anthropologists [19].

Pelvic differences can be assessed by qualitative (morphoscopic) and/or quantitative (morphometric) methods [20,21,22]. The former involves the visual analysis of structural differences, but can be limited by definitions that are often ill-defined and/or have unclear scoring procedures [23]. Additionally, morphoscopic methods are subjective and require appropriate observer experience [21]. Conversely, although quantitative morphometric analysis of the pelvis tends to be more time-consuming and complex than qualitative methods, it is less subjective and is generally associated with lower inter- and intra-observer error [20, 21]. Interestingly, however, there appears a paucity of research that has specifically evaluated morphometric differences in the pelvis between individuals from different populations [13,14,15]. Although Asia comprises individuals from many countries and of different ethnicities presenting obvious phenotypic variability in body size and shape [24], there have been no reports comparing the pelvic measurements between Asians, including Japanese, and other populations.

Computed tomography (CT) can capture high-level details of bony structures without the need to remove the soft tissue, thus offering a time-saving alternative to a physical forensic examination and concurrently protecting remains from further invasive manipulation [24, 25]. Data obtained using CT images are more representative of the contemporary population compared to documented skeletal collections based on body donation programs. This is largely because digital data from modern individuals is less likely to be influenced by secular population variances [26] compared to historical skeletal collections. Moreover, multiple studies have demonstrated that the use of CT images provides appropriate levels of reproducibility and accuracy, and that medical imaging offers an appropriate proxy to physical remains for the formulation of forensic standards designed to facilitate biological profiling of unknown skeletal remains [27].

The present study quantifies morphological differences between pelves of contemporary Japanese and Western Australian populations and thereafter evaluates the feasibility of population affinity classification based on morphometric data obtained from multidetector CT (MDCT) images using machine learning statistical approaches.

Materials and methods

Materials

Japanese population

The sample comprises postmortem CT (PMCT) scans of 207 adult corpses over 18 years of known age and sex (103 females, mean age = 40.76 ± 11.55 years; 104 males, mean age = 38.46 ± 8.68 years) from the Department of Forensic Medicine at the University of Tokyo between August 2017 and May 2022. The estimated postmortem interval for all the subjects was < 14 days. The exclusion criteria were fractures of the pelvis, a pubic symphysis diastasis, sacroiliac joint dislocations, burn injuries, bone implant, and acquired or congenital abnormalities. The study protocol was approved by the Ethics Committee of the University of Tokyo (2121264NI).

Western Australian population

The sample comprises MDCT scans of 158 adult individuals (78 females, mean age = 45.05 ± 18.17 years; 80 males, mean age = 41.23 ± 20.46 years) who presented at one of the major Western Australian hospitals (Perth region) for clinical evaluation between April 2009 and July 2012. In accordance with the National Statement on Ethical Conduct in Human Research (National Statement), the scans were anonymized, retaining only sex and age information. Although specific information on the ethnicity of each individual was not maintained in the patient data, the entire sample was taken as representative of a “typical” Western Australian population [28]. Exclusion criteria are the same as stated for the Japanese population. Research ethics approval was granted by the Human Research Ethics Committee of the University of Western Australia (2020/ET000038).

Methods

For the Japanese subjects, PMCT scanning was performed with a 16-row detector CT system (Eclos; Fujifilm Healthcare Corporation, Tokyo, Japan). The scanning protocol was as follows: collimation of 1.25 mm, reconstruction interval of 1.25 mm, tube voltage of 120 kV, and tube current of 200 mA. For Western Australian subjects, pelvic imaging was performed using a 64-slice CT scanner (Brilliance; Phillips Healthcare, NSW, Australia) with an average slice thickness of 0.94 mm, tube voltage of 100–140 kV, and automatic tube current modulation. The images were reconstructed with uniform thickness (an average thickness of 0.94 mm).

Image data processing and three-dimensional (3D) volume rendering were performed on a workstation (OsiriX MD version 11.0.2; Pixmeo SARL, Geneva, Switzerland). Soft tissue kernel was used for the acquisition of the CT. In accordance with previous studies [25, 29, 30], 19 pelvic landmarks (Table 1) were obtained on each sample. Using MorphDB (an in-house developed database application) and Excel software (Microsoft Office 2019, Microsoft, Redmond, WA, USA), 11 measurements, including two angles, were calculated using coordinates of the landmarks obtained in 3D images (Table 2 and Fig. 1).

Table 1 Definitions of the landmarks of the pelvis
Table 2 Definitions of the pelvic measurements (see Table 1 for landmark definition)
Fig. 1
figure 1

Three-dimensional computed tomography images showing pelvic measurements (see Table 2 for definition): a transverse pelvic inlet (TPI), anterior breadth of the sacrum (ABS), and anterior height of the sacrum (AHS); b subpubic angle (SPA), breadth of pelvic outlet (BPO), anterior upper spinal breadth of pelvis (AUSB), and height between anterior superior iliac spine and anteroinferior margin of ischial tuberosity (HAIT); c transverse pelvic outlet (TPO) and midpelvic breadth (MB); and d angle of greater sciatic notch (AGN) and height between anterior superior iliac spine and ischial spine (HAIS)

A subset of six subjects (three females and three males) were randomly selected to assess intra- (ST) and inter-observer errors (AN). All 19 pelvic landmarks were obtained on each of the six subjects, and this process was repeated six times, with a minimum of two-day intervals. The acquisition order of landmarks was changed each time to reduce recall between repetitions. Subsequently, the relative technical error of measurement (rTEM, %) and coefficient of reliability (R) were calculated. The acceptable rTEM range outlined by established anthropological studies [31,32,33] was taken to be < 5%; R values > 0.75 were considered sufficiently precise [34, 35].

Descriptive statistics, including ranges, mean, standard deviation, and median for each set of measurements, for both sexes, were calculated. The Brunner–Munzel test was used to determine if significant differences existed between the two groups; a p value of < 0.05 was considered statistically significant. Two machine learning methods were employed for population affinity classification: (i) random forest modeling (RFM), which belongs to a class of machine learning techniques comprising traditional classification trees created using nonparametric algorithms that incorporate majority voting and bagging to assign cases to response classes [36,37,38], and (ii) support vector machine (SVM), which uses data located at the edge of the multivariate space (the intersection of two groups) to generate classification rules by maximizing the margin between the two groups [39, 40].

The utility of machine learning models was examined in the following two scenarios: (i) two-way models distinguished by sex-specific and sex-mixed population affinity and (ii) a four-way model distinguished by population affinity and sex simultaneously. Regarding RFM, the random forest feature importance during the analysis was also calculated. All machine learning performances were analyzed using R 4.2.3 (R Foundation for Statistical Computing, Vienna, Austria) with the “randomForest” and “e1071” packages [41, 42].

Results

The rTEM and the R values were 0.19%–1.81% and 0.980–0.999, respectively (Table 3). The mean, range, and standard deviation values of the 11 measurements for both sexes are presented in Table 4 and 5. In considering the female sample, it was evident that four measurements (transverse pelvic inlet (TPI), anterior breadth of the sacrum (ABS), height between anterior superior iliac spine and ischial spine (HAIS), and height between anterior superior iliac spine and anteroinferior margin of ischial tuberosity (HAIT)) were significantly smaller in the Japanese population. The angle of the greater sciatic notch (AGN) and breadth of the pelvic outlet (BPO) were significantly larger compared to the Western Australian individuals (Table 4). For the Japanese male sample, four measurements (ABS, AGN, HAIS, and HAIT) were significantly smaller. Conversely, transverse pelvic outlet (TPO), midpelvic breadth (MB) and BPO were significantly larger than the corresponding data for the Western Australian male individuals (Table 5). No significant population affinity differences were observed in other measurements.

Table 3 Relative technical error of measurements (rTEM) and coefficient of reliability (R)
Table 4 Descriptive statistics of 11 pelvic measurements for the female sample
Table 5 Descriptive statistics of 11 pelvic measurements for the male sample

The results of the machine learning models are presented in Table 6 and 7. The accuracy of the two-way models was approximately 80% for RFM, and for the two-way sex-specific and sex-mixed models for SVM it was > 90% and > 85%, respectively. The sex-specific models had higher correct classification rates than the sex-mixed models, except for the Japanese male sample. The four-way model demonstrated an overall classification accuracy of 76.71% for RFM and 87.67% for SVM. All the correct classification rates were higher in the Japanese relative to the Western Australian sample.

Table 6 Classification matrix showing the classification of groups according to population affinity (two-way models)
Table 7 Classification matrix showing the classification of groups according to population affinity and sex (four-way models)

The random forest feature importance demonstrated that BPO and TPO were the strongest weighted measurements for correct classifications (express the greatest population variance) in the female and male samples, respectively (Fig. 2). Contrarily, the subpubic angle (SPA) had the highest mean decrease Gini in the four-way model, followed by AGN.

Fig. 2
figure 2

Random forest feature importance (mean decrease Gini) for the response variable: a the two-way female model, b the two-way male model, c the two-way sex-mixed model, and d the four-way sex and population affinity model

Discussion

In this study, the intra- and inter-observer errors were small and can be considered negligible, thereby indicating that pelvic landmark and measurement acquisition using 3D CT images is precise and reproducible. Significant measurement variances between the two populations were identified in various measurement values. The superior aspect of the pelvic cavity and the vertical direction of the pelvis were larger in the Western Australian population, whereas the inferior aspect of the pelvic cavity was larger in the Japanese sample. Patriquin et al. [13] reported that the pelves of South African whites were generally larger than those of South African blacks. Furthermore, Handa et al. [14] reported that white females had a wider pelvic inlet and outlet, and shallower antero-posterior diameter than African-American females. It is evident therefore that those results suggest potential for considerable differences in pelvic measurement values among different populations.

The present study demonstrated that the classification accuracy achieved for assigning the Japanese and Western Australian individuals to their respective population of origin was > 75% and > 85% based on RFM and SVM, respectively. Therefore, although phenotypic population differences are known to be most evident in the skull [13], pelvic measurements hold obvious forensic utility for classifying Japanese and Western Australian individuals when the complete skull is absent. Further, correct classification rates were higher for Japanese individuals compared to the Western Australians. Franklin and Flavel [43] noted that from Southeast Asia. Consequently, the correct classification rates for Western Australia would likely be expected to be lower due to the heterogeneity of that population. Conversely, classification accuracy for Japanese individuals is higher, thus suggesting that the Japanese population is relatively less mixed.

Patriquin et al. [13] distinguished black and white South African left os coxae using discriminant function analysis and achieved 85% and 88% average accuracies for females and males, respectively. İşcan [12] reported classification accuracy rates of 83% in males and 88% in females using three pelvic measurements in an African American and Caucasian American population. However, TPO/BPO, which both were found to be highly diagnostic of population affinity in the two-way models of the present study, were not included in previous research. Further research is required because TPO/BPO may be useful in other populations as well. Torimitsu et al. [29] demonstrated that MB can be used to classify sex with an associated classification accuracy of > 80% in a Japanese population. However, the results of this study confirmed that MB was a less important measurement for population affinity estimation. Contrarily, SPA and AGN were the most important variables in the four-way model to distinguish population affinity and sex simultaneously. Similarly, Small et al. [15] quantified the SPA in a South African population and demonstrated statistically significant differences not only between males and females, but also between blacks and whites. These results may be mainly due to the sexual dimorphism of the SPA [24, 25, 44,45,46]. According to Torimitsu et al. [29], SPA and AGN contributed most significantly to sex classification in a Japanese population (accuracy rate of 98.1%). Additionally, Franklin et al. [30] reported that the SPA and AGN could classify sex with 93.2% and 85.2% cross-validated accuracy rates in Western Australian individuals, respectively. However, when applying the discriminant formula using the SPA and AGN for Japanese individuals to those from Western Australia, the sex classification accuracy was as low as 53.5% [43].

In considering the latter results, the pelvis may differ considerably morphologically depending on population of origin, albeit it is still an important element for the estimation of sex, but seemingly based on increasing empirical evidence, population affinity as well. However, relative to the estimation of population affinity based on morphometric data, it should be noted that some populations are not well described in the literature. Regardless, it is essential to consider that population differences exist in pelvic morphology due to genetics, nutritional status and environmental factors, including climates [47,48,49,50,51,52,53,54,55]. Thus, a more comprehensive database, including diverse populations, is necessary to develop identification standards for forensic and physical applications. Furthermore, although the skull has been the focus of the majority of population affinity studies in forensic anthropology, other bones (e.g., femur and tibia) may also provide useful information [56]. Therefore, further studies investigating the feasibility of population affinity estimation based on other bones using CT images would be a useful addition to the extant literature that informs professional practice.

Most studies dedicated to population affinity estimation have analyzed the data obtained from physical specimens [13, 57, 58]. To the best of our knowledge, this study is the first to investigate the feasibility of population affinity estimation using 3D CT images of the pelvis. These images can reproduce complex curved features, and the data format facilitates statistically quantified computational geometrical analysis and the archiving case-related data [24, 59, 60]. Researchers have investigated the estimation of other biological attributes, such as sex, age, and stature, using pelvic CT images [29, 30, 61,62,63]. If CT data among institutions in different countries can be shared, it can facilitate the collection of multi-populational data and enable a deeper understanding of the diversity of pelvic morphology relative to morphometric population affinity variances.

It is important to that the present study is not without limitation. First, the data were collected from two different facilities using a 16-row and 64-row detector CT system, with different conditions for reconstructed images. Although these differences were unlikely to significantly affect the measurement data, it is more appropriate to use CT images from the same detector and under consistent conditions. Second, this study used PMCT and CT data of living patients. The shape and dimensions of human bones are not expected to change significantly after death; this study did not examine those differences. Third, the standard deviations for age in the Western Australian population were larger than those in the Japanese sample. Lovell [64] reported that the bone surfaces in older individuals are often highly irregular. Kolesova et al. [65] also reported that the pelvic size difference was associated with changes in age. Thus, including an elderly sample may have influenced the results of this study to some extent. Fourth, previous research has reported that the transverse diameter of the pelvic midplane and outlet can be influenced by hormone secretion at the end of pregnancy, leading to the softening of the pubic symphysis and pubic bone movement [66, 67]. However, in this study, information on whether female individuals had ever experienced pregnancy or childbirth was not available. Fifth, many of the measurements obtained in this study required an intact pelvis, and correct classification may not be possible if a fragmented pelvis is found. Finally, using morphometric geometric analysis could be able to detect other significant differences by detailing differences due to size and due to shape [7, 68, 69].

Conclusions

The present study demonstrated that pelvic measurements derived in 3D CT images can be useful in the population affinity classification of Japanese and Western Australian individuals, especially in cases where the skull is unavailable in the forensic and anthropological contexts. Further research on the CT data of the pelvis involving other populations that have not been investigated is suggested. Moreover, further studies based on other skeletal measurements using CT images to estimate population affinity should be conducted.