Introduction

Establishing the identity of unidentified human remains is of fundamental importance in a forensic investigation, particularly in the analysis of dismembered, burned, or severely mutilated corpses or skeletal remains [1]. Although estimating ancestry is especially challenging [2], ancestry is an integral parameter not only to assist identification efforts directly but also as a required precursor to estimating sex, age at death, stature, and other attributes using population specific data [3].

It is generally accepted that the skull, especially the midface, is the most diagnostic region of the skeleton for estimating ancestry [4, 5]. There are two main methodological approaches typically applied in the anthropological assessment: morphoscopic (visual or non-metric) and morphometric. Procedures for estimating ancestry, whatever the statistical treatment, focus on non-metric or metric features, based on appreciable and/or significant cranial diversity between global populations [4]. Although non-metric approaches lack objectivity and require more experience, metric methods have less so, largely because individual cranial measurements are clearly defined on the basis of established craniometrics landmarks [3].

Ancestry estimation based on linear discriminant analysis (LDA) is one of the most commonly applied statistical approaches; computer applications, such as FORDISC [6, 7] and CRANID [8, 9], simplify the use of LDA for ancestry estimation, and the associated output includes statistical quantification of accuracy (e.g., posterior and typicality probabilities) that are useful for interpretation and decision-making. In addition, a machine learning modeling technique for ancestry estimation on the basis of skeletal metric data has been proposed [10, 11]. However, it has been reported that American Southwest Hispanic skulls are often misclassified as Asians, in particular Japanese, when performing ancestry estimation using craniometric data [12]. Thus, it is important that crania from other global populations are examined and compared to those originating from Japan, to minimize the possibility of misclassifications.

Computed tomography (CT) clearly depicts bone structures [13, 14]. In addition, it is known that bone measurements in CT images can be acquired with the same level of accuracy as those from real bone specimens [15, 16]. Importantly, the requisite data for calculating predictive models for estimating biological attributes associated with a routine anthropological assessment can be effectively developed using data acquired in CT images [15, 17, 18]. However, to date, no study has examined the feasibility of ancestry estimation using CT scanning techniques.

The aim of the present study, therefore, is to explore morphological variances between crania from contemporary Japanese and Western Australian populations and thereafter assess the feasibility of ancestry classification on the basis of morphometric data acquired in multidetector CT (MDCT) images using machine learning statistical approaches.

Materials and methods

Materials

Japanese population

The sample comprises postmortem CT (PMCT) scans of 230 adult corpses of known age and sex (111 female, mean age 48.96 ± 18.08 years; 119 male, mean age 46.80 ± 18.39 years) at the Department of Forensic Medicine at the University of Tokyo between July 2017 and May 2022. The estimated postmortem interval for all subjects was <14 days. The exclusion criteria were fractures of the skull, lethal head trauma, burn injuries, and acquired or congenital abnormalities. The study protocol was approved by the ethics committee of our university (2121264NI).

Western Australian population

The sample comprises MDCT scans of 225 adult individuals (112 female patients, mean age = 40.47 ± 12.99 years; 113 male patients, mean age = 37.97 ± 12.67 years) at one of the major Western Australian hospitals for clinical cranial evaluation between September 2010 and May 2011. In accordance with the National Statement on Ethical Conduct in Human Research (National Statement), the scans were anonymized, with only sex and age data retained. Although specific information on the ethnicity of each individual was not maintained in the patient data, the entire sample was taken as representative of a “‘typical” Western Australian population [19]. Individuals with obvious congenital or acquired cranial pathology were excluded if it affected their normal morphology and/or ability to accurately locate necessary cranial landmarks. Research ethics approval was granted by the human research ethics committee of our university (2020/ET000038).

Methods

For Japanese subjects, PMCT scanning was performed with a 16-row detector CT system (Eclos; Fujifilm Healthcare Corporation, Tokyo, Japan). The scanning protocol was as follows: collimation of 0.625 mm, reconstruction interval of 0.625 mm, tube voltage of 120 kV, and tube current of 200 mA.

For Western Australian subjects, cranial imaging was performed using a 64-slice CT scanner (Brilliance; Phillips Healthcare, NSW, Australia) with an average slice thickness of 0.90 mm, tube voltage of 120–140 kV, and automatic tube current modulation (235–423 mA). The images were reconstructed to the same thickness.

Image data processing and three-dimensional (3D) volume rendering were performed on a workstation (OsiriX MD version 11.0.2; Pixmeo SARL, Geneva, Switzerland). Soft tissue kernel was used for the acquisition of the CT. In accordance with previous research [19,20,21,22,23,24,25], 35 cranial landmarks (Table 1) were acquired on each sample. Thereafter, 18 measurements (Table 2; Fig. 1) were calculated based on coordinates of the landmarks obtained in 3D images using MorphDB (an in-house developed database application) and the Excel software (Microsoft Office 2019, Microsoft, Redmond, Washington, USA).

Table 1 Definitions of the landmarks
Table 2 Definitions of the measurements
Fig. 1
figure 1

Three-dimensional computed tomography images showing cranial measurements (see Table 2 for definition): a maximum cranial length (MCL) and left mastoid height (LMH); b basion-nasion length (BNL); c frontal breadth (FRB), biorbital breadth (BOB), left orbit height (LOH), left orbit breadth (LOB), nasal height (NH), and nasal breadth (NB); d bimaxillary breadth (MXB), maxillo-alveolar breadth (MAB), bizygomatic breadth (ZYB), biauricular breadth (BAE), foramen magnum length (FML), and foramen magnum breadth (FMB). Right mastoid height (RMH), right orbit height (ROH), or right orbit breadth (ROB) is not shown because they are just left symmetrical

A subset of six subjects (three females and three males) was randomly selected; the original author recollected the subset data to assess intra-observer error; another co-author collected the subset data to assess inter-observer error. All 35 cranial landmarks were acquired on each of six subjects, and this process was repeated a total of six times, with a minimum of two days interval. In an effort to mitigate recall between repetitions, landmark acquisition order was varied each time. The relative technical error of measurements (rTEM, %) and coefficient of reliability (R) were then calculated. The acceptable rTEM range as outlined by established anthropological research [26,27,28] was < 5%; an R value > 0.75 was considered sufficiently precise [21, 29].

Descriptive statistics including mean, standard deviation, and range were calculated to provide an overview of the sample. The Kruskal-Wallis test was used to compare the measurements of the four groups (Japanese and Western Australian female and male); a p value of <0.05 was considered statistically significant. A series of post hoc Mann-Whitney U test was used for between-groups comparisons with Bonferroni correction after the Kruskal-Wallis test. Two machine learning methods (random forest modeling, RFM; support vector machine, SVM) were used to classify ancestry. RFM belongs to a class of machine learning techniques that consist of traditional classification trees created using a nonparametric algorithm that incorporates majority voting and bagging to assign cases to response classes [30,31,32]. Bagging is a machine learning ensemble meta-algorithm that generates multiple new training sets by sampling (replacing) the original data, reducing the variance between observations and the potential for overfitting, and improving model stability and classification accuracy [33]. The latter facilitates an estimate of out-of-bag error, which provides an unbiased estimate of the generalization ability of the random forest compared to K-fold cross-validation [34].

SVMs generate classification rules by maximizing the margin between two groups using data located at the edges of the multivariate space (the intersection of two groups). This method identifies support vectors to define a classifier that maximizes classification accuracy, and thus, small sample sizes or outlier values do not affect SVMs [35]. The number of support vectors is directly related to the predictability of the model, with a higher number of support vectors indicating less separable data [36].

The utility of machine learning models was examined in three scenarios: (i) a two-way model distinguished by ancestry (without considering sex), (ii) a four-way model distinguished by ancestry and sex simultaneously, and (iii) two-way models distinguished by sex-specific (female and male) population. The random forest feature importance was calculated during the analysis. All machine learning performances were analyzed using R 4.2.3 (R Foundation for Statistical Computing, Vienna, Austria) with the “randomForest” and “e1071” packages [37, 38].

Results

As shown in Table 3, the rTEMs and the R values ranged from 0.41 to 2.66% and from 0.785 to 0.993, respectively. The mean, standard deviation, and ranges of the 18 measurements are shown in Table 4. Among Japanese individuals, all of the mean measurement values in male subjects are larger than the corresponding mean measurements for female subjects. Among the Western Australian individuals, mean male values were greater than females for all measurements, except FRB. Among the same sexes, the mean values of some measurements (e.g., MCL, BNL, and FRB) were larger in Western Australian compared to Japanese individuals. Conversely, the mean values of ZYB, LMH, RMH, and NH were slightly larger in Japanese individuals. The Kruskal-Wallis test showed significant differences in all of the measurements between the four groups (p < 0.001). The results of the post hoc tests comparing the measurements of each two groups are given in Online Resource 1.

Table 3 Relative technical error of measurements (rTEM) and coefficient of reliability (R)
Table 4 Descriptive statistics of 18 cranial measurements

Results of machine learning models are summarized in Tables 58. As shown in Table 5, the accuracy of the two-way unisex model was 93.2% for RFM and 97.1% for SVM, respectively. Accuracy was higher in the Japanese, compared to the Western Australian sample. The four-way model demonstrated an overall classification accuracy of 84.0% for RFM and 93.0% for SVM (Table 6). Female individuals were more likely to be correctly classified according to sex. The sex-specific ancestry analyses also revealed that the correct classification rates were higher in the female (95.1% for RFM and 100% for SVM) than in the male samples (91.4% for RFM and 97.4% for SVM; Tables 7 and 8).

Table 5 Classification matrix showing classification of groups according to ancestry
Table 6 Classification matrix showing classification of groups according to ancestry and sex
Table 7 Classification matrix showing classification of groups according to sex-specific ancestry (female)
Table 8 Classification matrix showing classification of groups according to sex-specific ancestry (male)

Random forest feature importance demonstrated that MCL, ZYB, MXB, and BAE ranked in the top five in all analyses, indicating that they are the strongest weighted measurements (express the greatest population variance) relative to achieving correct classifications (Fig. 2; Online Resource 2).

Fig. 2
figure 2

Random forest feature importance (mean decrease Gini) for the response variable. a The two-way unisex model, b the four-way sex and ancestry model, c the two-way female model, and d the two-way male model

Discussion

In the present study, the intra- and inter-observer errors were small and likely to be negligible. Considering these results, cranial landmark acquisition using 3D CT images in this study is highly reproducible.

Cranial size and shape are known to express significant populational variability [39,40,41]. Previous research has reported that the skulls of Australian individuals are on average longer, taller, and with narrower frontal bones than those of Japanese individuals [19, 42, 43]. The results of this study also showed that the mean values of MCL and BNL were larger in Western Australian subjects, whereas the mean values of LMH and RMH were larger in Japanese subjects. However, the mean values of FRB were larger for Western Australian individuals, which did not accord with previous findings.

The results of this study revealed that the correct classification rates of the Japanese and Western Australian individuals were greater than 90% when sex was not considered, and above 80% when sex was classified simultaneously. This clearly indicates that cranial measurements derived from CT images are useful for the classification of Japanese and Western Australian individuals. Franklin and Flavel [44] reported that Australia has become a multicultural country, with a dynamic population demographic that includes considerable migration from southeast Asia, with intra-population variation also evident between the States and Territories. Irrespective, the results of this study suggest that Japanese and Western Australian populations have different skull shapes.

In the present study, the mean age of the Japanese individuals was higher than that of the Western Australian subjects. Previous research has noted an increase in the size of some cranial regions in middle-aged to elderly individuals; it has accordingly been suggested that large differences between age distributions may skew results [45]. Conversely, Albert et al. [46] reported modest increases in craniofacial dimensions (1.1–1.6 mm) in the elderly, with facial height presenting the largest change relative to antemortem tooth loss. Therefore, although the effects of age-related craniofacial remodeling should be recognized, age may not be expected to be a major contributor to the misclassification rate observed in this study.

Hefner et al. [11] achieved 89.6% accuracy based on applying RFM to 110 skulls representing modern American White (n = 72), African American (n = 38), and Southwestern Hispanic (n = 39) skulls; the important craniometric variables in the RFM included MCL and PBL. Navega et al. [10] used AncesTrees, which is a statistical procedure using RFM comprising 23 craniometric variables from 1734 individuals, representative of six major ancestral groups (European, African, Austro-Melanesian, Polynesian, Native American, and East Asian). The program was tested in 128 adult crania (32 individuals of African ancestry and 96 of European ancestry); 75% of the African and 79.2% of the European individuals were correctly identified. The model involving only African and European ancestral groups was more accurate (93.8%). Navega et al. [10] also reported that ZYB and BAE are the important variables in the RFM for ancestry and sex estimation. Similarly, our study demonstrated that MCL, ZYB, and BAE were the important factors (Fig. 2). Furthermore, there were significant differences in these variables between each two groups except for ZYB and BAE between Japanese female and Western Australian male groups, indicating that these measurements are useful in the classification of ancestry in multiple global populations.

Hefner and Ousley [47] also reported that RFM demonstrated an overall classification rate of 85.5% for ancestry in a sample of 543 Americans (African American, Hispanic and White). The most significant advantage of RFM is that it transforms a low-bias and high-variance model into a low-bias and low-variance model by training multiple decision trees simultaneously; the low variance is the most valuable feature for anthropological application [10]. Although LDA is also a valuable method to perform ancestry estimation from metrical data, it can usually be outperformed by the latest machine learning classification algorithms [11, 48,49,50].

Spiros and Hefner [35] and Hefner and Ousley [47] reported that the SVM model provided higher classification accuracy than the RFM for the American individuals. Nikita and Nikitas [51] also reported that the SVM is more effective than RFM for skeletal ancestry and sex assessment. In this study, SVM revealed higher correct classification rates than RFM, probably due to the relatively small amount of data. Further studies considering other machine learning methods are necessary in the future.

In this study, when only female samples were considered, the correct classification rates according to ancestry were over 95%. Therefore, it is hypothesized that if an unidentified skull can be presumed to be female, it may be possible to estimate ancestry more accurately. However, other studies on sex-specific ancestry estimation using the skull are scarce and further research is required.

The majority of previous craniometric research specific to the estimation of ancestry have involved the analysis of data acquired in physical specimens [10, 52]. The data in the present study are, to the best of our knowledge, amongst the first to assess the feasibility of ancestry estimation using 3D CT images of the skull. Non-invasive imaging techniques can maintain and visualize the arrangement of spatial structures and their potential relationships [53]. Previous research has considered the reliability and accuracy of estimating other biological attributes, such as sex, age, and stature in CT images [19, 54,55,56,57]. Sharing CT data among facilities in various countries should facilitate collection of global and contemporary multi-populational data and thus afford a deeper understanding of craniometric diversity relative to ancestral origin.

Regarding skeletal measurements for ancestry estimation, it should be recognized that some populations are poorly described in the published literature. Therefore, more comprehensive databases of missing persons are required to enhance identification efforts. In addition, it is crucial to consider that cranial features and measurements are phenotypic characteristics that are partially determined by heritability and influenced by the environment [58], and as noted above, are changing through time and especially with increased admixture in contemporary populations.

The literature clearly indicates that the majority of forensic anthropology ancestry studies focused broadly on the skull, despite bones such as the femur and tibia also potentially providing useful information [3]. Thus, further research addressing other skeletal measurements based on CT imaging is needed to assess the feasibility of ancestry estimation.

This study demonstrated several limitations. First, data were collected from two different facilities using 16- and 64-row detector CT systems, with different conditions for the reconstructed images. Although these issues were not expected to significantly affect the measurements, it would be more appropriate to use the same detector CT images under the same conditions. Second, PMCT data and CT data from living patients were used in this study. Although it is unlikely that the shape or measurements change significantly between ante- and post-mortem human remains, the difference was not investigated in the present study. Third, morphometric geometric analysis may detect other significant differences by detailing differences due to cranial size and shape [59, 60].

Conclusions

This study demonstrated that cranial measurements derived in 3D CT images are useful for the accurate statistical classification of Japanese and Western Australian individuals. This is the first study to investigate the feasibility of ancestry estimation using 3D CT images of cranial measurements. Further CT data involving other populations should be collected to enable research of more diverse populations across the globe. In addition, further research addressing other skeletal measurements based on CT imaging to estimate ancestry is required.