Abstract
Sex prediction from bone measurements that display sexual dimorphism is one of the most important aspects of forensic anthropology. Some bones like the skull and pelvis display distinct morphological traits that are based on shape. These morphological traits which are sexually dimorphic across different population groups have been shown to provide an acceptably high degree of accuracy in the prediction of sex. A sample of 100 patella of Mixed Ancestry South Africans (MASA) was collected from the Dart collection. Six parameters: maximum height (maxh), maximum breadth (maxw), maximum thickness (maxt), the height of articular facet (haf), lateral articular facet breadth (lafb), and medial articular facet breath (mafb) were used in this study. Stepwise and direct discriminant function analyses were performed for measurements that exhibited significant differences between male and female mean measurements, and the “leave-one-out” approach was used for validation. Moreover, we have used eight classical machine learning techniques along with feature ranking techniques to identify the best feature combinations for sex prediction. A stacking machine learning technique was trained and validated to classify the sex of the subject. Here, we have used the top performing three ML classifiers as base learners and the predictions of these models were used as inputs to different machine learning classifiers as meta learners to make the final decision. The measurements of the patella of South Africans are sexually dimorphic and this observation is consistent with previous studies on the patella of different countries. The range of average accuracies obtained for pooled multivariate discriminant function equations is 81.9–84.2%, while the stacking ML technique provides 90.8% accuracy which compares well with those presented for previous studies in other parts of the world. In conclusion, the models proposed in this study from measurements of the patella of different population groups in South Africa are useful resent with reasonably high average accuracies.
Introduction
Prediction of sex from recovered or discovered bones in human identification is an important first step taken by forensic anthropologists to reduce the number of possible matches by 50% [1]. This process, in conjunction with the estimation of age, stature and population affinity, is essential in the establishment of the identity of an individual from skeletons. Some bones like the skull and pelvis display distinct morphological traits that are based on shape. These morphological traits which are sexually dimorphic across different population groups have been shown to provide an acceptably high degree of accuracy in the estimation of sex [2]. While most earlier researchers and some lately [3] have focused on the use of description of the observed morphological traits, considered to be a subjective method which requires many years of experience, attention has been shifted recently to the quantification of the differences in shape that are observed on bones. These quantifications can be performed objectively using various morphometric techniques such as including and not limited to geometric morphometrics [4,5,6,7,8,9,10,11].
In the absence of the pelvis and the skull which display obvious morphological differences between males and females, size differences which are present in most bones of the postcranial skeleton can also be used for sex prediction. This metrical approach can also be used on incomplete or fragmentary remains. Standard measured parameters of different bones of the skeleton which can be easily reproducible have been analyzed through the use of various statistical methods including and not limited to logistic regression and discriminant function in different population groups. It is a well-established fact that these equations are population-specific and as such should be limited in their application to only population groups for which they were formulated to obtain acceptably high classification average accuracies. This has led to the generation of population-specific discriminant function and logistic regression equations for measurements of the skull [12,13,14,15,16,17], bones of the vertebral column [18, 19], pelvis [20,21,22], long bones of the upper [23,24,25,26,27,28,29] and lower extremities [30,31,32,33,34,35,36], and hand and foot bones [37,38,39,40] in different population groups of the world with acceptably high classification rates.
Similar population-specific local standards have also been established in South Africa for the prediction of sex from dimensions of the skull [37, 38]) and postcranial bones [39,40,41,42,43,44,45,46]. Most of these equations have been derived from data collected from samples of bones of South Africans of European and African descent, which are housed mainly in the Raymond Dart collection of human skeletons [47], Pretoria bone collection [48], and UCT osteological collection [49]. Recently, successful attempts have also been made to formulate population-specific equations for Mixed-Ancestry South Africans or colored [43, 50,51,52]. While discriminant function analysis has been widely used in sex estimation using various bones of the human skeleton in South Africa, no previous attempts have utilized other novel techniques such as machine learning algorithm for that purpose.
Over the past decade, machine learning (ML) algorithms have become increasingly integrated into clinical predictive modeling, e.g., in prognostic models using health data [53,54,55]. Recent reviews have also highlighted the high interest in ML approaches for clinical guidance, as well as the necessity for more prognostic studies [56]. While there is a significant rise in interest in ML in health care, only a few studies have evaluated its capability of outperforming conventional statistical models (CSMs) in terms of predictability or not. ML rapidly examines continuously expanding datasets and enables the identification of patterns and trends that may not be directly visible to clinicians [57]. Other advantages of ML are its flexibility, it is nonparametric, requires no data model for the probability distribution of the outcome variable, requires no pre‐specification of covariates, and it can process large input variables simultaneously [58, 59]. In a clinical context of predicting mortality from gastrointestinal bleeding, a systematic review demonstrated higher c‐indices and predictive capacity of ML than clinical risk scores [60]. Another study aimed at predicting bleeding risk following percutaneous coronary intervention found that ML characterized bleeding risk better than a standard discriminant analysis model [61]. Likewise, ML vs CSMs using the TOPCAT trial dataset showed that ML methods presented higher c‐indices than CSMs for readmission (0.76 vs 0.73) and predicting mortality (0.72 vs 0.66) [62].
Osteometric variations between population groups have necessitated the need to propose population specificity standards for human identification. In addition, each of these groups exhibits and expresses sexual dimorphism to various degrees. Some authors have observed major flaws in the development and application of population-specific standards for the prediction of sex [63] and stature [64, 65] leading to the proposal and recommendation for use of generic equations. This study thus aims to (1) formulate generic models for sex prediction using measurements of the patella of South Africans of African (SAAD) and European descent (SAED), as well as Mixed Ancestry South Africans (MASA) (2), and compare the classification rates obtained from the generic models using linear discriminant analysis with those obtained from machine learning algorithms.
Materials and methods
The Human Research Ethics Committee (Medical) of the University of the Witwatersrand, Johannesburg, South Africa granted an ethical clearance waiver (Ethics Waiver Number: W-CJ-140604–1) before the commencement of this study. Data were collected from a sample of patella of Mixed Ancestry South Africans (MASA). Additional data analyzed in the current study were obtained from previously published studies on sexual dimorphism of measurements of patella of South Africans of European descent (SAED) [46] and South Africans of African descent (SAAD) [66]. The sample distribution for the data used in the current study is as follows: SAAD (50 males and 50 females), SAED (50 males and 50 females), and MASA (30 males and 30 females). The birth dates range from 1999 to 2017. The source of data was the Raymond A. Dart collection of human skeletons, considered one of the largest collections of human skeletons in the world [47]. It is located in the School of Anatomical Sciences of the University of the Witwatersrand, Johannesburg, South Africa. The patella belonged to individuals whose age at death ranged between 25 and 79 years and whose birth years were between 1928 and 1991. Patella with any pathological features like osteophytic lipping, lesions, or any other obvious deformities were excluded from this study. Six parameters were measured on each patella. These are maximum height (maxh), maximum breadth (maxw), maximum thickness (maxt), the height of articular facet (haf), lateral articular facet breadth (lafb), and medial articular facet breath (mafb). These measurements have been described in previous studies [46] and are illustrated in Fig. 1. Lin’s [67] concordance correlation coefficient of reproducibility was used for the assessment of intraobserver error. It has been shown that this method assesses the agreement between the test and retest measurement and is considered as a measure of prevision of the measuring technique.
Statistical analysis
Statistical analysis was performed using the Stata/MP 13.0 software. SPSS version 23 software program was used for the linear discriminant analysis of data. Sex differences were described using numbers and percentages. The number of missing data, mean, standard deviations, median, and quartiles (Q1, Q3) for combined data for all measurements from SAED, SAAD, and MASA were calculated separately for each sex. In univariate analysis, the Rank sum tests were used and performed for all variables. A statistically significant difference was defined as a P value < 0.05.
Discriminant function analysis
Stepwise and direct discriminant function analyses were performed for measurements that exhibited significant sex differences. The “leave-one-out” classification procedure was then used to evaluate the validity of the functions. In this procedure, each case in the sample is classified using the function that is generated without it. Then, generic stepwise and direct discriminant functions with acceptably high average accuracies were selected. Each of the generic functions selected was used to predict sex for each case in samples of SAED, SAAD, and MASA. The average accuracies in correct sex prediction for each of the functions were calculated for each population group separately.
Machine learning-based analysis
Six patella measurements were present in the dataset that were evaluated to determine the Pearson correlation among them. Figure 2 shows the heatmap of correlation, and it was found that none of them is highly correlated to the other. A maximum correlation of 0.81 was found between maxb and lafb. However, the threshold of removing highly correlated features was considered r > 0.85 and therefore, none of the features was removed for the next phase of the investigation.
Data normalization
The accuracy of the machine learning models is dependent on the quality of the input data for achieving generalized performance. This involves data normalization that entails scaling or transforming the data to make each selection contribute equally during the training process. The performance enhancement of the machine learning models employing such has been verified by many studies [68]. In this study, Z-score normalization was utilized due to its sensitivity to outliers. The formula for Z-score normalization as shown in Eq. (1) is:
where \({v}^{^{\prime}}\), \(v\), \({\mu }_{v}\), and \({\sigma }_{v}\) denote the new value, original value, mean, and standard deviation of the variable values in the training samples, respectively. This method transforms the data with a mean of 0 and a standard deviation of 1.
Top-ranked features identification
The feature selection technique automatically selects those features which are most significant for output prediction. This method thus helps in reducing overfitting and training time as well as improving accuracy. Several different feature selection techniques, e.g., univariate selection, recursive feature elimination (RFE), principal component analysis (PCA), bagged decision trees like random forest and extra trees, and boosted trees like Extreme Gradient Boosting (XGBoost) etc. have been used in the literature. However, the present study investigated and compared three feature selection techniques: (1) XGBoost [69], (2) Extra tree [70], and (3) Random Forest [71, 72] to determine the best feature combinations for sex prediction using different ML classifiers.
Model development
The present used and compared different machine learning classifiers such as Gradient boosting [69], XGBoost [73], Extra tree [73], K-nearest neighbour (KNN) [73], Adaboost [73], Random Forest [73], linear discriminant analysis (LDA) [71, 72], and Logistic regression [74] using the best feature combination which was identified by the feature selection techniques for sex prediction. Then we investigated a stacking approach where a combination of base learners and meta learners was used to classify the sex of the subject. Here, we have used the top performing three ML classifiers as base learners and the predictions of these models were used as inputs to different machine learning classifiers as meta learners to make the final decision. Eight different machine learning classifiers as meta learners in the stacking approach to find the best performing classifier were investigated.
If a single dataset A, which consists of feature vectors (\({{\varvec{x}}}_{{\varvec{i}}}\)) and their classification probability score is \({{\varvec{y}}}_{{\varvec{i}}}\). At first, a set of base-level classifiers \({{\varvec{M}}}_{1},\dots \dots ,{{\varvec{M}}}_{{\varvec{p}} }\) is generated and the outputs are used to train the meta-level classifier as illustrated in Fig. 3.
Five-fold cross-validation was used to create a training set for the meta-level classifier. Among these folds, base-level classifiers were trained on four-folds, leaving one fold for validation. Each base-level classifier predicts a probability distribution over the possible class values. Thus, using input x, a probability distribution is created using the predictions of the base-level classifier set, M:
where \({({\varvec{c}}}_{1},{{\varvec{c}}}_{2},\dots \dots ,{{\varvec{c}}}_{{\varvec{n}}})\) is the set of possible class values and \({{\varvec{P}}}^{{\varvec{M}}}\left({{\varvec{c}}}_{{\varvec{i}}}|{\varvec{x}}\right)\) denotes the probability that example x belongs to a class \({{\varvec{c}}}_{{\varvec{j}}}\) as estimated (and predicted) by classifier M in Eq. 2. The class, \({{\varvec{c}}}_{{\varvec{i}}}\) with the highest-class probability, \({{\varvec{P}}}^{{{\varvec{M}}}_{{\varvec{j}}}}\left({{\varvec{c}}}_{{\varvec{i}}}|{\varvec{x}}\right)\) is predicted by the classifier, M. The meta-level classifier \({{\varvec{M}}}_{{\varvec{f}}}\) and attributes are thus the probabilities predicted for each possible class by each of the base-level classifiers, i.e., \({{\varvec{P}}}^{{{\varvec{M}}}_{{\varvec{j}}}}\left({{\varvec{c}}}_{{\varvec{i}}}|{\varvec{x}}\right)\) for i = 1,…., n, and j = 1,…., p. The pseudo-code for the stacking approach is shown in Algorithm 1.
Performance metrics
Different classification models were compared using the top-ranked features from the testing data to calculate the performance matrices in classifying male and female classes. The best performing classifier was evaluated for different combinations of features as input to the model by calculating the receiver operating characteristic (ROC)—area under the curve (AUC) and performance metrics such as accuracy, precision, sensitivity, specificity, and F1-Score as shown in Eqs. (3–7). Different classification algorithms and different features’ combinations of the best performing algorithm were validated using fivefold cross-validation where training and testing were done on 80% and 20% of data, respectively, and this process was repeated 5-times to test the entire dataset. Weighted average within 95% confidence interval was calculated for sensitivity, specificity, precision, F1-score, and overall accuracy from the confusion matrix that accumulates all test (unseen) fold results of the fivefold cross-validation. The correct estimation of a male subject is true positive (TP), and the correct estimation of the female subject is true negative (TN). The incorrect estimation of the male subject as female is false negative (FN) and the incorrect estimation of the female subject as male is false positive (FP)
Results
The values of Lin’s concordance correlation coefficient of reproducibility ranges between 0.974 and 0.998 (Table 1). These values fell within the recommended range from 0.90 to 0.99 which indicates that all patella measurements are easily reproducible and the subsequent data analyzed in this study are not significantly affected by measurement error. For clarity, the analyses on discriminant function and machine learning are presented separately. In the first section, results from descriptive statistics, univariate and multivariant discriminant function analysis are presented while in the second section, best feature selection, validation of ML model and stacking technique are reported.
Discriminant function analysis
The descriptive statistics of all measurements for pooled data are shown in Table 2. The male showed significantly higher (p ≤ 0.05) mean measurements for all measures than the female. All patella measurements were subjected to stepwise and direct discriminant function analyses. The unstandardized coefficients, constants, average accuracies, cross-validation in correct sex classification, and the sectioning points for individual measurements are shown in Table 3. The best performing variable, maxh, presented with an acceptably high average accuracy of 82% (Table 3) while the other variables presented with low average accuracies which ranged between 69 for lafb and 79% for maxb (Table 3).
Table 4 shows the stepwise and direct discriminant function analysis using various combinations of measurements. In the stepwise analysis, four measurements were selected, namely, maxh, maxb, maxt, and haf. The discriminant function equation derived from these measurements provided an average accuracy of 84.2% as shown in Table 4. The other functions in Table 4 were formulated using direct discriminant function analysis of patella measurements. The average accuracies in correct sex classification ranged between 81.9 (Function D5, Table 4) and 83.5% (Function D1, Table 4). The results of the cross-validation using the leave-one-out classification showed that the average accuracy in correct sex classification for most of the presented functions remained unchanged (Table 4). Functions D2 and D4 showed a minimal and insignificant drop in classification rate of 0.8% thereby confirming the validity of the derived functions from the pooled data.
Machine learning analysis
Best feature combination for sex prediction
In this study, three feature ranking algorithms were used to identify top-ranked features among all features. These top-ranked features were investigated with 8 different classifiers which were performed with Top-1 to Top-6 features to identify the best performing classification model and best feature combination simultaneously for sex prediction. It was observed that RF and ET feature selection techniques produced the same feature ranking while the XGBoost feature selection algorithm produced different rankings as shown in Fig. 4.
In this study, Top-3 features (maxh, maxb, and maxt) using a random forest (RF) feature selection algorithm with random forest machine learning (ML) classifier outperformed other classifiers. Table 5 shows the overall accuracies and weighted average performance for the other matrices (precision, sensitivity, specificity, and F1-score) with a 95% confidence interval to identify the best feature combinations using Top 1 to 6 features for fivefold cross-validation using the best classifier (AdaBoost classifier for XGBoost feature selection and RF classifier for RF and ET feature selection algorithms).
It is clearly seen that the Top-3 features (maxh, maxb, and maxt) from RF and ET feature selection techniques produced the best performance of overall accuracy, and weighted precision, sensitivity, specificity, and F1-score of 89.61%, 89.67%, 89.62%, 89.62%, and 89.61%, respectively, using RF classifier for sex prediction. It was noticed that six features were required in the case of the XGBoost feature selection technique to produce the best performance of overall accuracy, weighted precision, sensitivity, specificity, and F1-score of 89.23%, 89.31%, 89.23%, 89.24%, and 89.22%, respectively, using AdaBoost classifier, whereas similar performance was produced by only three features from RF and ET feature selection techniques with RF classifier.
Development and validation of different ML and stacking models
We investigated the best combination of three features (maxh, maxb, and maxt) and selected the best ML classifiers among eight classifiers as base models and trained different ML classifiers as meta-learners. We selected top two models (RF and ET) where the overall accuracies, and weighted precision, sensitivity, specificity, and F1-score were 89.23%, 88.64%, 90.00%, 88.46%, 89.31%, and 85.34%, 85.27%, 85.03%, 85.45%, 85.14%, respectively (Table 6). The stacking approach was trained with RF and ET classifiers as a base learner and Gradient Boosting classifier as meta learner outperformed other meta learner classifiers with the performance of overall accuracy, and weighted precision, sensitivity, specificity, and F1-score of 90.77%, 89.55%, 92.3%, 89.23%, and 90.9%, respectively.
Figure 5A shows the confusion matrix of the best performing ML classifier (RF classifier), and Fig. 5B shows the confusion matrix of the best performing stacking model (with Gradient Boosting classifiers as a meta learner). It can be noticed that even with the best performing RF classifier, 13 out of 130 male subjects were miss-classified as female and 15 out of 130 female subjects were miss-classified as male when the stacking model with Gradient Boosting classifier as a meta learner outperformed other ML classifiers, where 120 out of 130 male subjects were correctly classified as male and 116 out of 130 females were correctly identified as a female with the stacking model. Thus, the stacking model outperformed other state-of-the-art ML classifiers.
Figure 6 shows the AUC) /ROC curve (also known as AUROC (area under the receiver operating characteristics)) for sex identification using different ML classifiers, which is one of the most important evaluation metrics for checking any classification model’s performance. It is apparent that the stacking model outperformed other ML classifiers for classification with 92.65% AUC (Fig. 6).
Discussion
Sex prediction from measurements of bones that display sexual dimorphism is an important aspect of forensic anthropology. Population-specific standards which are generally considered to provide the best estimation of sex have been published for the skull and postcranial elements in different parts of the world [30, 40, 44, 50, 75, 76]. These became necessary because of the observed variation in the display of sexual dimorphism between different population groups [26]. Consequently, the application of standards for population groups is not encouraged for other groups. One disadvantage of using population-specific standards is having prior knowledge of the population group of the skeleton under forensic analysis [63, 64].
In the present study, patella measurements of South Africans were shown to be sexually dimorphic and which is consistent with the results of previous studies on patella of Italians [77], Americans [78], Iranians [79], Spaniards [80], African Americans [81], Japanese [82], Turks [83], and Swiss [84]. The range of average accuracies obtained for pooled multivariate discriminant function equations (DFEs) and stacking ML technique in the current study (81.9–90.8%, Tables 4 and 6) compares well with those presented for previous studies in other parts of the world (Table 7). It is interesting to note that the highest average accuracies for all studies that utilized skeletal collection in the acquisition of data are approximately not more than 85% [46, 66, 80, 81, 84]. Other studies in which data were collected from radiological modalities and autopsy acquired data presented higher average accuracies in correct sex classification. This is an indication that the source of data and how these data are collected may influence the outcome of the results of the analysis.
In addition, the average accuracies for the pooled data for the patella from the current study using discriminant function analysis (81.9–84.2%) are similar to those observed for SAED (75–85%: [46]) and SAAD (78–85%: [66]). The highest drop in average accuracies in the current study (0.8%) is lower than those from population-specific DFEs for SAED and SAAD which were 2.5% and 3.3%, respectively. This observation of a lower drop of average accuracies for DFEs obtained from pooled data compared to population-specific DFEs agrees with Bidmos and Mazengenya [85] in which the highest drop in average accuracies for pooled data DFEs was 0.9%. This observation indicates a better validity of pooled DFEs compared to population-specific DFEs. Another previously documented advantage of the application of DFEs from pooled data is that they can be applied to an unknown skeleton without the prior knowledge of the population group [64].
The same performance trend is observed in the current study using the ML algorithm compared to the conventional statistical model. The standards generated for sex classification produced higher average accuracies (Table 4) compared to those generated using discriminant function analysis (Table 3). Compared to the average accuracies for the pooled data for the patella from the current study using discriminant function analysis (81.9–84.2%), the stacking machine learning approach provides an overall accuracy of 90.77%. This clearly indicates that with the application of the machine learning paradigm a better classification of sex from the patella measurement is possible.
From the aforementioned, linear and volumetric measurements of the patella are useful in human identification and have produced acceptably high average accuracies in correct sex classification. However, human identification from skeletal remains can be demanding especially in a country like South Africa with diverse population groups. Consequently, the application of population-specific DFEs in the human identification process will require the prior assignment of the population group which might be difficult if not impossible in cases where complete skeletons are not available or in the absence of bones that display obvious population-specific traits. Another confounding problem is the difficulty in the assignment of population groups to individuals who fall within the boundaries of other population groups [52]. This has led some researchers [63, 64] to propose the idea of a generation of generic standards for the estimation of sex and stature, especially for population groups that have similarities. In both studies, the authors argue for the generation and use of generic equations for sex assignment [63] and stature estimation [64] citing the lack of adequate data and bone collections from which data could be collected for the derivation of population-specific standards in some countries.
The pelvic bone is considered one of the most sexually dimorphic bones in the body based on its design for parturition in females. Measurements of this bone have been used in the generation of population-specific DFEs in different parts of the world [2]. Steyn and Patriquin [63] assessed the reliability of population-specific DFEs compared to those from pooled data from diverse population groups. They reported a comparable performance of population-specific and generic DFEs with regard to classification rates and concluded that population-specific equations are not superior to generic equations with regard to sex prediction using dimensions of the pelvic bone. Macaluso Jr [86] evaluated the reliability of generic equations that were published by Steyn and Patriquin [63] on a French sample and reported that the average accuracies of the pooled data remained unchanged when applied to a French sample. In addition, there was no significant difference between the average accuracies obtained from the use of population-specific equations and generic equations [86]. This observation provided further proof of the usefulness and applicability of generic equations for sex prediction using pelvic measurements to other related population groups, where the application of ML can significantly help.
Attempts have also been made to apply the notion of non-superiority of population-specific equations over generic equations using measurements of the vertebrae. Hora and Sládek [87] observed that anteroposterior and mediolateral body diameters were found to be universally applicable in sex prediction while other measurements of the studied vertebrae showed population specificity in the assignment of sex. In a similar study, Bidmos and Mazengenya [85] investigated the utility of pooled data in the generation of generic equations for sex prediction. They evaluated the accuracies of population-specific equations formulated from measurements of long upper limb bones of South Africans and noted that the average accuracies of generic equations are acceptably high (81 to 87%). In addition, the cross-validated accuracies remained largely unchanged thereby confirming the usefulness of these equations in cases where it becomes difficult to establish the population affinity of the skeletal remain under forensic investigation.
Recently, Indra et al. [84] assessed the validity of population-specific DFEs formulated for patella measurements of the contemporary Spanish population group on a Swiss sample. The average accuracies obtained by Indra et al. [84] ranged from 63 to 84% for patella which was similar to those presented in an earlier study by Peckmann et al. [80]. The results of the current study in which the average accuracies obtained for generic equations are comparable to those presented for population-specific equations for South Africans of European [46] and African descent [66] in agreement with the observation made in previous studies [63, 84,85,86,87]. This, therefore, shows the utility of generic equations when the patella is available for forensic analysis in South Africa.
The range of average accuracies for generic equations formulated in the current study (81.9–90.8%) is similar to those obtained for population-specific equations derived for South Africans of European (67.5–85%) and African descent (78.3–85%). This is in agreement with the observation that was made by Indra et al. [84].
Conclusions
Prediction of sex from recovered or discovered bones in human identification is a very important step in forensic anthropologists along with the estimation of age, stature, and population affinity. In this study, we have used a dataset of 100 people collected from a sample of patella of Mixed Ancestry South Africans (MASA). Six parameters maxh, maxw, maxt, haf, lafb, and mafb were used. Two types of investigation have been carried out in this study to compare the performance of conventional statistical analysis versus the classical machine learning techniques in the estimation of sex. Different discriminant function analyses were performed for measurements that exhibited significant differences between male and female mean measurements. On the other hand, several ML algorithms were trained, validated, and tested to identify the best feature combination for detecting the sex from the patella measurements. The range of average accuracies obtained for pooled multivariate DFEs is 81.9–84.2% while the stacking ML technique provides 90.8% accuracy which compares well with those presented in previous studies. In conclusion, findings from the current study show that generic models formulated from measurements of the patella of different population groups in South Africa are useful resent with reasonably high average accuracies. Consequently, they are useful in the prediction of sex in cases when the population affinity is either difficult or impossible to ascertain and their applicability to populations of Southern Africa will require validation studies in individual populations from different countries in the region.
Data availability
Data available on request.
References
Loth SR, İşcan MY (2000) Morphological age estimation. In: Siegel JA, Saukko PJ, Knupfer GC (eds) Encyclopaedia of forensic sciences. Academic Press, London, p 1600
İşcan and Steyn (2013) The human skeleton in forensic medicine, 3rd ed. Charles C Thomas Publisher, p 493. https://doi.org/10.1002/ajpa.22754
Ðuricá MÐ, Rakočevic Z, Rakočevic´b R, Ðonicá DÐ (2004) The reliability of sex determination of skeletons from forensic context in the Balkans. Forensic Sci Int 147:159–164. https://doi.org/10.1016/j.forsciint.2004.09.111
Rogers T, Saunders S (1994) Accuracy of sex determination using morphological traits of the human pelvis. J Forensic Sci 39:13683J. https://doi.org/10.1520/JFS13683J
Kimmerle EH, Ross A, Slice D (2008) Sexual dimorphism in America: geometric morphometric analysis of the craniofacial region. J Forensic Sci 53:54–57. https://doi.org/10.1111/j.1556-4029.2007.00627.x
Bigoni L, Velemínská J, Brůžek J (2010) Three-dimensional geometric morphometric analysis of cranio-facial sexual dimorphism in a Central European sample of known sex. Homo 61:16–32. https://doi.org/10.1016/J.JCHB.2009.09.004
Franklin D, Cardini A, Flavel A, Kuliukas A (2012) The application of traditional and geometric morphometric analyses for forensic quantification of sexual dimorphism: preliminary investigations in a Western Australian population. Int J Legal Med 126:549–558. https://doi.org/10.1007/s00414-012-0684-8
Rusk KM, Ousley SD (2016) An evaluation of sex-and ancestry-specific variation in sacral size and shape using geometric morphometrics. Am J Phys Anthropol 159:646–654. https://doi.org/10.1002/ajpa.22926
Čechová M, Dupej J, Brůžek J et al (2019) Sex estimation using external morphology of the frontal bone and frontal sinuses in a contemporary Czech population. Int J legal Med 133:1285–1294. https://doi.org/10.1007/s00414-019-02063-8
Bertsatos A, Chovalopoulou ME, Brůžek J, Bejdová Š (2020) Advanced procedures for skull sex estimation using sexually dimorphic morphometric features. Int J Legal Med 134:1927–1937. https://doi.org/10.1007/s00414-020-02334-9
del Bove A, Profico A, Riga A et al (2020) A geometric morphometric approach to the study of sexual dimorphism in the modern human frontal bone. Am J Phys Anthropol 173:643–654. https://doi.org/10.1002/ajpa.24154
Kajanoja P (1966) Sex determination of finnish crania by discriminant function analysis. Am J Phys Anthropol 24:29–33. https://doi.org/10.1002/ajpa.1330240104
İşcan MY, Yoshino M, Kato S (1995) Sexual dimorphism in modern Japanese crania. Am J Hum Biol 7:459–464. https://doi.org/10.1002/AJHB.1310070407
Patil KR, Mody RN (2004) Determination of sex by discriminant function analysis and stature by regression analysis: a lateral cephalometric study. Forensic Sci Int 147:175–180. https://doi.org/10.1016/j.forsciint.2004.09.071
Spradley MK, Jantz RL (2011) Sex estimation in forensic anthropology: skull versus postcranial elements. J Forensic Sci 56:289–296. https://doi.org/10.1111/j.1556-4029.2010.01635.x
Ogawa Y, Imaizumi K, Miyasaka S, Yoshino M (2013) Discriminant functions for sex estimation of modern Japanese skulls. J Forensic Legal Med 20:234–238. https://doi.org/10.1016/j.jflm.2012.09.023
Marinescu M, Panaitescu V, Rosu M, Maru N, Punga A (2014) Sexual dimorphism of crania in a Romanian population: discriminant function analysis approach for sex estimation. Rom J Leg Med 22:21–26. https://doi.org/10.4323/rjlm.2014.21
Marino EA (1995) Sex estimation using the first cervical vertebra. Am J Phys Anthropol 97:127–133. https://doi.org/10.1002/AJPA.1330970205
Garoufi N, Bertsatos A, Chovalopoulou ME, Villa C (2020) Forensic sex estimation using the vertebrae: an evaluation on two European populations. Int J Legal Med 134:2307–2318. https://doi.org/10.1007/S00414-020-02430-W
Oikonomopoulou EK, Valakos E, Nikita E (2017) Population-specificity of sexual dimorphism in cranial and pelvic traits: evaluation of existing and proposal of new functions for sex assessment in a Greek assemblage. Int J Legal Med 131:1731–1738. https://doi.org/10.1007/s00414-017-1655-x
Knecht S, Nogueira L, Maël S et al (2021) Sex estimation from the greater sciatic notch: a comparison of classical statistical models and machine learning algorithms. Int J Legal Med 135:2603–2613. https://doi.org/10.1007/s00414-021-02700-1
Cao Y, Ma Y, Vieira DN et al (2021) A potential method for sex estimation of human skeletons using deep learning and three-dimensional surface scanning. Int J Legal Med 135:2409–2421. https://doi.org/10.1007/s00414-021-02675-z
Holman DJ, Bennett KA (1991) Determination of sex from arm bone measurements. Am J Phys Anthropol 84:421–426. https://doi.org/10.1002/ajpa.1330840406
Işcan MY, Loth SR, King CA et al (1998) Sexual dimorphism in the humerus: a comparative analysis of Chinese, Japanese and Thais. Forensic Sci Int 98:17–29. https://doi.org/10.1016/S0379-0738(98)00119-4
Mall G, Hubig M, Èttner AB et al (2001) Sex determination and estimation of stature from the longbones of the arm. Forensic Sci Int 117:23–30. https://doi.org/10.1016/S0379-0738(00)00445-X
Sakaue K (2004) Sexual determination of long bones in recent Japanese. Anthropol Sci 112:75–81. https://doi.org/10.1537/ase.00067
Frutos LR (2004) Metric determination of sex from the humerus in a Guatemalan forensic sample. Forensic Sci Int 147:153–157. https://doi.org/10.1016/j.forsciint.2004.09.077
Kranioti EF, Michalodimitrakis M (2009) Sexual dimorphism of the humerus in contemporary cretans–a population-specific study and a review of the literature. J Forensic Sci 54:996–1000. https://doi.org/10.1111/j.1556-4029.2009.01103.x
Celbis O, Agritmis H (2006) Estimation of stature and determination of sex from radial and ulnar bone lengths in a Turkish corpse sample. Forensic Sci Int 159:135–139. https://doi.org/10.1016/j.forsciint.2005.05.016
Black TKA (1978) A new method for assessing the sex of fragmentary skeletal remains: femoral shaft circumference. Am J Phys Anthropol 48:227–231. https://doi.org/10.1002/ajpa.1330480217
Işcan MY, Shihai D (1995) Sexual dimorphism in the Chinese femur. Forensic Sci Int 74:79–87. https://doi.org/10.1016/0379-0738(95)01691-B
Mall G, Graw M, Gehring KD, Hubig M (2000) Determination of sex from femora. Forensic Sci Int 113:315–321
King C, İşcan MY, Loth SR (1998) Metric and comparative analysis of sexual dimorphism in the Thai femur. J Forensic Sci 43:954–958
Jantz RL, Kimmerle EH, Baraybar JP (2008) Sexing and stature estimation criteria for Balkan populations. J Forensic Sci 53:601–605. https://doi.org/10.1111/j.1556-4029.2008.00716.x
Boldsen JL, Milner GR, Boldsen SK (2015) Sex estimation from modern american humeri and femora, accounting for sample variance structure. Am J Phys Anthropol 158:745–750. https://doi.org/10.1002/ajpa.22812
Moore MK, DiGangi EA, Niño Ruíz FP et al (2016) Metric sex estimation from the postcranial skeleton for the Colombian population. Forensic Sci Int 262:286.e1-286.e8. https://doi.org/10.1016/J.FORSCIINT.2016.02.018
Steyn M, Işcan MY (1998) Sexual dimorphism in the crania and mandibles of South African whites. Forensic Sci Int 98:9–16. https://doi.org/10.1016/S0379-0738(98)00120-0
Dayal MR, Spocter MA, Bidmos MA (2008) An assessment of sex using the skull of black South Africans by discriminant function analysis. HOMO- J Comp Hum Biol 59:209–221. https://doi.org/10.1016/j.jchb.2007.01.001
Vance VL, Steyn M, L’Abbé EN (2011) Nonmetric sex determination from the distal and posterior humerus in black and white South Africans. J Forensic Sci 56:710–714. https://doi.org/10.1111/j.1556-4029.2011.01724.x
Steyn M, Işcan MY (1997) Sex determination from the femur and tibia in South African whites. Forensic Sci Int 90:111–119. https://doi.org/10.1016/S0379-0738(97)00156-4
Asala SA, Bidmos MA, Dayal MR (2004) Discriminant function sexing of fragmentary femur of South African blacks. Forensic Sci Int 145:25–29. https://doi.org/10.1016/j.forsciint.2004.03.010
Barrier ILO, L’Abbé EN (2008) Sex determination from the radius and ulna in a modern South African sample. Forensic Sci Int 179:85.e1-85.e7. https://doi.org/10.1016/j.forsciint.2008.04.012
Krüger GC, L’abbé EN, Stull KE (2017) Sex estimation from the long bones of modern South Africans. Int J Legal Med 131:275–285. https://doi.org/10.1007/s00414-016-1488-z
Bidmos MA, Asala SA (2003) Discriminant function sexing of the calcaneus of the South African whites. J Forensic Sci 48:1213–1218. https://doi.org/10.1520/JFS2003104
Bidmos MA, Asala SA (2004) Sexual dimorphism of the calcaneus of South African blacks. J Forensic Sci 49(3):446–450
Bidmos MA, Steinberg N, Kuykendall KL (2005) Patella measurements of South African whites as sex assessors. HOMO- J Comp Hum Biol 56:69–74. https://doi.org/10.1016/J.JCHB.2004.10.002
Dayal MR, Kegley AD, Štrkalj G et al (2009) The history and composition of the Raymond A. Dart collection of human skeletons at the University of the Witwatersrand, Johannesburg. South Africa Am J Phys Anthropol 140:324–335. https://doi.org/10.1002/ajpa.21072
L’Abbé EN, Loots M, Meiring JH (2005) The pretoria bone collection: a modern South African skeletal sample. Homo 56:197–205. https://doi.org/10.1016/J.JCHB.2004.10.004
Gibbon VE, Morris AG (2021) UCT Human skeletal repository: its stewardship, history, composition and educational use. Homo 72:139–147. https://doi.org/10.1127/HOMO/2021/1402
Krüger GC, L’Abbé EN, Stull KE, Kenyhercz MW (2015) Sexual dimorphism in cranial morphology among modern South Africans. Int J Legal Med 129(4):869–75. https://doi.org/10.1007/s00414-014-1111-0
Liebenberg L, Krüger GC, L’Abbé EN, Stull KE (2019) Postcraniometric sex and ancestry estimation in South Africa: a validation study. Int J Legal Med 133:289–296. https://doi.org/10.1007/s00414-018-1865-x
Mokoena P, Billings BK, Gibbon V et al (2019) Development of discriminant functions to estimate sex in upper limb bones for mixed ancestry South Africans. Sci Justice 59:660–666. https://doi.org/10.1016/j.scijus.2019.06.007
Austin PC, Lee DS, Steyerberg EW, Tu JV (2012) Regression trees for predicting mortality in patients with cardiovascular disease: what improvement is achieved by using ensemble-based methods? Biom J 54:657–673. https://doi.org/10.1002/bimj.201100251
Shouval R, Hadanny A, Shlomo N et al (2017) Machine learning for prediction of 30-day mortality after ST elevation myocardial infraction: an acute coronary syndrome Israeli Survey data mining study. Int J Cardiol 246:7–13. https://doi.org/10.1016/j.ijcard.2017.05.067
Pieszko K, Hiczkiewicz J, Budzianowski P, Budzianowski J, Rzeźniczak J, Pieszko K, Burchardt P (2019) Predicting long-term mortality after acute coronary syndrome using machine learning techniques and hematological markers. Dis Markers: 1–9. https://doi.org/10.1155/2019/9056402
Topol EJ (2019) High-performance medicine: the convergence of human and artificial intelligence. Nat Med 25:44–56. https://doi.org/10.1038/s41591-018-0300-7
Deo RC (2015) Machine learning in medicine. Circulation 132:1920–1930. https://doi.org/10.1161/CIRCULATIONAHA.115.001593
Harrell Jr, Frank E (2019) Glossary of statistical terms. Vanderbilt University School of Medicine. https://hbiostat.org/doc/glossary.pdf. Accessed 1 May 2022
Esteva A, Kuprel B, Novoa RA et al (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118. https://doi.org/10.1038/nature21056
Shung D, Simonov M, Gentry M et al (2019) Machine learning to predict outcomes in patients with acute gastrointestinal bleeding: a systematic review. Dig Dis Sci 64:2078–2087. https://doi.org/10.1007/s10620-019-05645-z
Mortazavi BJ, Bucholz EM, Desai NR, Huang C, Curtis JP, Masoudi FA, Shaw RE, Negahban SN, Krumholz HM (2019) Comparison of machine learning methods with national cardiovascular data registry models for prediction of risk of bleeding after percutaneous coronary intervention. JAMA Netw Open 2(7):e196835–e196835. https://doi.org/10.1001/jamanetworkopen.2019.68352:e196835-e196835
Angraal S, Mortazavi BJ, Gupta A et al (2020) Machine learning prediction of mortality and hospitalization in heart failure with preserved ejection fraction. JACC Heart Fail 8:12–21. https://doi.org/10.1016/j.jchf.2019.06.013
Steyn M, Patriquin ML (2009) Osteometric sex determination from the pelvis—does population specificity matter? Forensic Sci Int 191:113.e1-113.e5. https://doi.org/10.1016/J.FORSCIINT.2009.07.009
Albanese J, Tuck A, Gomes J, Cardoso HF (2016) An alternative approach for estimating stature from long bones that is not population-or group-specific. Forensic Sci Int 259:59–68. https://doi.org/10.1016/j.forsciint.2015.12.011
Howley D, Howley P, Oxenham MF (2018) Estimation of sex and stature using anthropometry of the upper extremity in an Australian population. Forensic Sci Int 287:220.e1-220.e10. https://doi.org/10.1016/J.FORSCIINT.2018.03.017
Dayal MR, Bidmos MA (2005) Discriminating sex in South African blacks using patella dimensions. J Forensic Sci 50:1–4. https://doi.org/10.1520/JFS2004306
Lin LK (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics 45:255. https://doi.org/10.2307/2532051
Singh D, Singh B (2020) Investigating the impact of data normalization on classification performance. Appl Soft Comput 97:105524. https://doi.org/10.1016/j.asoc.2019.105524
Chen T, Guestrin C (2016) XGBoost: reliable large-scale tree boosting system. arXiv
Sharaff A, Gupta H (2019) Extra-tree classifier with metaheuristics approach for email classification. Adv Intell Syst 924:189–197. https://doi.org/10.1007/978-981-13-6861-5_17
Biau G, Scornet E (2016) A random forest guided tour. TEST 25:197–227. https://doi.org/10.1007/s11749-016-0481-7
Biau G, Scornet E (2016) Rejoinder on: a random forest guided tour. TEST 25:264–268. https://doi.org/10.1007/S11749-016-0488-0
Guo G, Wang H, Bell D et al (2003) KNN model-based approach in classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lect Notes in Comput Sci vol 2888. Springer, Berlin, Heidelberg. 2888:986–996. https://doi.org/10.1007/978-3-540-39964-3_62
Subasi A (2020) Practical machine learning for data analysis using python. Academic Press
DiBennardo R, Taylor Jv (1979) Sex assessment of the femur: a test of a new method. Am J Phys Anthropol 50:635–637. https://doi.org/10.1002/AJPA.1330500415
Garcia S (2012) Is the circumference at the nutrient foramen of the tibia of value to sex determination on human osteological collections? testing a new method. Int J Osteoarchaeol 22:361–365. https://doi.org/10.1002/OA.1202
Introna F, di Vella G, Campobasso Cp (1998) Sex determination by discriminant analysis of patella measurements. Forensic Sci Int 95:39–45. https://doi.org/10.1016/S0379-0738(98)00080-2
Mahfouz M, Badawi A, Merkl B et al (2007) Patella sex determination by 3D statistical shape models and nonlinear classifiers. Forensic Sci Int 173:161–170. https://doi.org/10.1016/j.forsciint.2007.02.024
Akhlaghi M, Sheikhazadi A, Naghsh A, Dorvashi G (2010) Identification of sex in Iranian population using patella dimensions. J Forensic Legal Med 17:150–155. https://doi.org/10.1016/J.JFLM.2009.11.005
Peckmann TR, Meek S, Dilkie N, Rozendaal A (2016) Determination of sex from the patella in a contemporary Spanish population. J Forensic Legal Med 44:84–91. https://doi.org/10.1016/j.jflm.2016.09.007
Peckmann TR, Fisher B (2018) Sex estimation from the patella in an African American population. J Forensic Legal Med 54:1–7. https://doi.org/10.1016/j.jflm.2017.12.002
Michiue T, Hishmat A, Oritani S et al (2018) Virtual computed tomography morphometry of the patella for estimation of sex using postmortem Japanese adult data in forensic identification. Forensic Sci Int 285:206-e1. https://doi.org/10.1016/j.forsciint.2017.11.029
Teke YH, Ünlütürk Ö, Günaydin E et al (2018) Determining gender by taking measurements from magnetic resonance images of the patella. J Forensic Legal Med 58:87–92. https://doi.org/10.1016/j.jflm.2018.05.002
Indra L, Vach W, Desideri J et al (2021) Testing the validity of population-specific sex estimation equations: an evaluation based on talus and patella measurements. Sci Justice 61:555–563. https://doi.org/10.1016/j.scijus.2021.06.011
Bidmos MA, Mazengenya P (2021) Accuracies of discriminant function equations for sex estimation using long bones of upper extremities. Int J Legal Med 135:1095–102. https://doi.org/10.1007/s00414-020-02458-y
Macaluso PJ (2010) The efficacy of sternal measurements for sex estimation in South African blacks. Forensic Sci Int 202:111.e1-111.e7. https://doi.org/10.1016/j.forsciint.2010.07.019
Hora M, Sládek V (2018) Population specificity of sex estimation from vertebrae. Forensic Sci Int 291:279.e1-279.e12. https://doi.org/10.1016/j.forsciint.2018.08.015
Acknowledgements
The authors sincerely thank those who donated their bodies to science so that anatomical research could be performed. Results from such research can potentially increase mankind's overall knowledge which can then improve forensic science. Therefore, these donors and their families deserve our highest gratitude. We are grateful to the School of Anatomical Sciences of the University of the Witwatersrand for giving access to the Raymond Dart Collections.
Funding
Open Access funding provided by the Qatar National Library.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
Ethical clearance waiver was obtained with number W-CJ-140604–1.
Informed consent
Not applicable.
Conflict of interest
The authors declare no competing interests.
Research involving human participants and/or animals
Not applicable.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bidmos, M.A., Olateju, O.I., Latiff, S. et al. Machine learning and discriminant function analysis in the formulation of generic models for sex prediction using patella measurements. Int J Legal Med 137, 471–485 (2023). https://doi.org/10.1007/s00414-022-02899-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00414-022-02899-7
Keywords
- Forensic anthropology
- Sex prediction
- Patella
- Discriminant function analyses
- Machine learning