7.1 Introduction

The objective of this chapter is to analyze the reliability of different CFS methodologies and the corresponding technical approaches to the CFS identification technique. Moreover, we aim to examine the subjectivity and discriminative power of the different criteria (detailed in Sect. 7.2 ) for assessing the skull-face correspondence either proposed in the literature or by any of the MEPROCS partners.

This novel study is expected to provide important insights to better understand: (1) which are the most convenient characteristics of every method included in this study, (2) which are the most and less discriminative criteria; and (3) which criteria are dependent more on the expert and which criteria are more independent, that is, less subjective. The two latter points could give an idea of how many and which criteria are needed to reach a reliable conclusion. Those criteria that are determined to be more discriminatory could be later included as a recommended standard for CFS.

7.2 Study on the Performance of Different Craniofacial Superimposition Approaches

7.2.1 Experimental Study

Each participant was requested to tackle each of the provided cases using the typical protocol that they would follow at their institutions. The participants were requested to fill an identification form with description of the protocol/methodology (i.e., software, equipment, orientation process, landmarks, assessment criteria) employed. For each case, a final identification decision (either positive or negative) should be reported along with the rationale supporting the decision and at least one image illustrating the overlay/superimposition outcome.

The dataset used in this reliability test consisted of two sets, divided by sex, of seven CFS case studies each of them. These 14 CFS cases involve a total number of 60 SFO problems as given in Table 7.1. The dataset was collected at the University of Tennessee after obtaining informed consent from the responsible party for the deceased, and provided to the MEPROCS project as data share protocol established through the University of Dundee.

Table 7.1 Summary of the characteristics of the datasets employed for the study

The dataset consisted generically of a set of ante-mortem photos, photos of the skull (with scales), and a set of 3D models of the skull acquired by laser scanning technology (Fastscan Polhemus Scorpion scanner). Physical 1:1 replicas of the skull 3D models were provided to those participants performing video-superimposition. Each set of case studies had the following structure: cases 1–4 mimic a scenario with one skull and three possible candidates, where only one ante-mortem photo of each candidate is available. In case 5, a more complex scenario is simulated, including four skulls and four possible candidates, with only one available ante-mortem photo of each candidate. In cases 6 and 7, the scenario simulated includes one skull and only one possible candidate, with several photos of the candidate available for analysis (see Table 7.1 for the case studies detailed explanation).

The performance of each participant was measured by computing true-positive, false-positive, true- and false-negative rates, and overall accuracy. All indicators were calculated for each sex and all case studies pooled together. Experience and familiarity with craniofacial identification techniques was also taken into account and level of experience of the participants was classified according to the following scheme:

  • No previous experience and no CFS-related training.

  • No previous experience but CFS-related training.

  • Short previous research experience and CFS-related training.

  • Moderate previous experience with CFS real cases and CFS-related training.

  • Broad experience with CFS real cases.

The study was carried out by 26 participants from the following institutions: University of Granada (Spain), University of Dundee (Scotland), Legal Medicine and Forensic Sciences Institute (Peru), North Carolina State University (USA), Complutense University of Madrid (Spain), University of Melbourne (Australia), Azienda Ospadaliera-Universitaria di Trieste (Italy), Russian Academy of Sciences (Russia), Portuguese Judiciary Police (Portugal), Moscow Region State Bureau of Forensic Examination (Russia), Spanish Civil Guard (Spain), Turkish Council of Forensic Medicine (Turkey), National Research Institute of Police Science (Japan), University of Milan (Italy), South African Police Service (South Africa), University of Vilnius (Lithuania), and University Sains Malaysia (Malaysia). In Table 7.2 all the participants (numbered from 1 to 26) are listed in the study with the corresponding level of experience. Since not all the participants completed the whole study, information of the dataset(s) tackled by each of them is provided as well.

Table 7.2 Participants of the study, their experience related to CFS, and datasets tackled

Tables 7.3, 7.4, 7.5, and 7.6 summarize the methodologies employed by the participants grouped by the technological approach followed. They were classified following the taxonomy given in The Scientific Working Group for Forensic Anthropology (2012b), that is, computer-aided semi-automatic 3D-2D superimposition (Table 7.3), computer-aided manual 3D-2D superimposition (Table 7.4), computer-aided manual video superimposition (Table 7.5), and computer-aided manual photo superimposition (Table 7.6). The first column also indicates both the type of dataset used and the global performance. The datasets are either male, female, or both. The global performance of the participant methodology refers to the percentage of correct decisions. Significant details of each of them are briefly explained according to software and equipment employed, how the SFO process is tackled, and the kind of skull-face relationship assessment made (decision making).

Table 7.3 Summarization of computer-aided semiautomatic 3D-2D superimposition (CAs3DS) approaches that participated in the study
Table 7.4 Summarization of computer-aided manual 3D-2D superimposition (CAm3DS) approaches that participated in the study
Table 7.5 Summarization of computer-aided manual video superimposition (CAmVS) approaches that participated in the study
Table 7.6 Summarization of computer-aided manual photo superimposition (CAm3PS) approaches that participated in the study

7.2.2 Results

A total number of 1152 CFS problems have been tackled within this study. While previous Tables 7.3, 7.4, 7.5, and 7.6 reported on the global performance (correct decisions) of each participant-methodology, the following three tables report the results obtained by each participant, considering separately the two different datasets independently (Tables 7.7 and 7.8, male and female respectively) and both together (Table 7.9). Detailed performance indicators such as true positive (TP), false positive (FP), true negative (TN), and false negative (FN) are given in any case.

Table 7.7 Performance of CFS methodologies on male dataset
Table 7.8 Performance of CFS methodologies on female dataset
Table 7.9 Performance of CFS methodologies on male and female dataset

Considering only the global performance, participants P2, P3, and P4 achieved the higher rates surpassing 90.00% up to 94.29% achieved by P4 (Tables 7.3 and 7.4). They share a similar SFO approach (computer-aided 3D-2D), although P2 made use of a semiautomatic software. During the decision-making stage, they all employed the morphological criteria in Wilkinson (2004), and two of them also analyzed the criteria introduced in Austin-Smith and Maples (1994) and Yoshino (2012). However, only P2 tackled both female and male datasets. Similar performances were obtained by P26 (88.30%) and P18 (87.69%). While the first followed video superimposition approach, the second employed computer 3D-2D software. Again, a morphological approach was the key aspect, leading their skull-face relationship assessment. From these rates, participants’ performance decreases almost linearly until the worst results by P23, who based both the SFO and decision making in a landmark comparison.

Looking deeply at the individual performance, it is quite obvious that higher rates of true negatives were achieved in comparison to true positives. Just focusing on those participants who carried out the study over both male and females datasets (Table 7.9), we observe that four of them achieved true negative rates equal or higher than 90.00% (P2, P14, P18, P26). However, the same four participants achieved 80.00%, 50.00%, 62.50%, and 66.67% of true positive, respectively. According to the average behavior considering the two datasets, the mean true positive rate is 52.63%, while the mean true negative rate is 84.20%. Consequently, the false-positive rate is significantly lower than the false-negative rate. It is important to remark that the number of negative cases (50) is five times the number of positive cases (10).

Table 7.10 reports performance indicator of the different participants – methodologies grouped by the level of experience of the participant. There are not significant differences related to the level of experience of the participants. There is not a correlation between the performance and the level of experience of the practitioners. While the participants who achieve higher rates of correct decisions are in category 3 (84.68%), those generating lower results are, surprisingly, category four, together with category two grouped participants with the lowest performance (75.00% and 74.32%, respectively).

Table 7.10 Overall accuracy of CFS grouped by level of experience of the participant

Finally, Table 7.11 depicts the overall accuracy according to the technological approach used by each participant. In overall, the approach followed by Participants 2, 3 (CAm3DS approach) and 24, 25, and 26 (CAmVS approach) is the most accurate (88.49% and 84.56%, respectively). These technological approaches represent the past and the future of the CFS technological development.

Table 7.11 Overall accuracy of CFS grouped by technological approach employed

7.2.2.1 Set of Criteria for Assessing the Skull-Face Overlay Relationship

With all the data generated, some of the most representative experts in craniofacial identification joined in a discussion intended to identify and agree on the most important issues that have to be considered to properly employ the CFS technique. Tables 7.12, 7.13, 7.14, and 7.15 depict the identification of a set of common criteria for assessing the skull-face correspondence.

Table 7.12 Marking lines used to analyze anatomical consistency
Table 7.13 Landmarks used to evaluate soft tissue thickness
Table 7.14 Consistency of the bony and facial outlines/morphological curves
Table 7.15 Positional relationship analyzed to assess anatomical consistency

7.3 Study on the Criteria Assessing Skull-Face Correspondence in Craniofacial Superimposition

The MEPROCS consortium designed the current study, which aims to analyze the subjectivity and discriminative power of the different criteria (defined in the previous section of this chapter) for assessing the skull-face correspondence either proposed in the literature or by any of the MEPROCS partners.

7.3.1 Experimental Study

The dataset used in this study consisted of 18 different CFS problems, some of them composed of more than one image of the same subject, 24 SFOs in total.

Skull 3D models were obtained from patients whose head has been scanned with a cone beam computed tomography (CBCT).

The skull 3D models employed suffer from two different problems. Firstly, we only have part of the whole skull, from the jaw to the upper orbits, without including parietal, occipital, and part of the temporal areas. Secondly, the 3D model is, to a greater or lesser extent, noisy and may not be accurately represented. All those problems have the same origin: the use of CBCTs instead of CTs. High-resolution CTs together with photographs of the patient/volunteer were not accessible. However, one benefit of the CBCT data is that the volunteer was upright rather than supine, as commonly recorded for CT scanning.

Frontal and lateral photographs were taken of the same patients to create a set of positive cases, while other people with similar facial geometry were photographed in order to compose a set of negative cases. Nine of eighteen cases were positives and the other nine were negatives. Twelve of the photographs were lateral and twelve were frontal, half of them belonging to positive cases and the other half to negative cases.

The participants were provided the same 24 SFOs as a single image with four different layers: facial photograph with and without landmarks and skull projection with and without landmarks.

For the sake of an objective analysis, it was important to focus the attention of the participants on the criteria for analyzing the skull and the face relationship only. This study should include both positive and negative SFOs. The procedure to obtain each type of SFO was different.

For positive cases, optimal SFOs were achieved using the following procedure. The DICOM images resulting from the CBCT machine were automatically processed to obtain the corresponding 3D face and 3D skull models. After positioning homologous points in both the 3D face model and the photograph, the former was automatically projected onto the latter so as to obtain an ideal match. Then, the parameters originating from that match between the 3D face model and the photograph were applied to the 3D skull model, resulting in an objective and accurate SFO. The latter superimposition is considered a ground-truth SFO.

Figure 7.1 shows an overview of the whole ground-truth data creation process.

Fig. 7.1
figure 1

Overview of the ground-truth data creation process

For negative cases, the SFOs were performed using Face2SkullTM software. An expert was asked to obtain the best possible SFO and to judge the skull-face relationship without being informed of the actual negative relationship to avoid biasing the SFO process.

For the criteria assessment study, 37 forensic experts were asked to indicate which specific criteria they are going to use for evaluating the 24 skull-face relationships. The criteria are organized in four groups according to the family criteria: lines (group 1), landmarks-soft tissue (group 2), outlines (group 3), and positional relationship (group 4). Group 1 is composed of 28 criteria, group 2 has 27 criteria, group 3 is a set of 19 criteria, and group 4 is made up of 21 criteria.

Forensic experts were asked to evaluate the skull-face correspondence following a systematic approach. For each SFO, the degree of consistency of all the criteria previously selected was indicated using the following values: 0: not evaluable, 1: not match, 2: poor match, 3: doubtful match, 4: good match and 5: perfect match.

In order to avoid personal interpretations, MEPROCS partners assigned in advance (before giving the instructions to the participants) the value 0 to those criteria they considered unable to be visually checked due to the noisy nature of the image, the absence of the bony part, or the pose of the photograph. That was carried out for each single SFO case.

Finally, for each SFO case (and also for each CFS case when it implies more than one SFO), participants were asked to indicate the final identification decision according to the following scale: −3: strong support of not being the same person,−2: moderate support of not being the same person,−1: limited support of not being the same person, 0: not determined, +1: limited support of being the same person, +2: moderate support of being the same person, and +3: strong support of being the same person. Therefore, the dataset is composed by:

The forensic expert, the specific SFO case and its state (positive or negative), the photograph of the SFO case (frontal or lateral pose), the criteria used by the expert in order to evaluate the corresponding SFO case, the family of the criteria, the degree of consistency of these latter criteria given by the expert (0: not evaluable, 1: not match, 2: poor match, 3: doubtful match, 4: good match and 5: perfect match), and the decision of the expert for the corresponding case (−3: strong support of not being the same person, −2: moderate support of not being the same person, −1: limited support of not being the same person, 0: not determined, +1: limited support of being the same person, +2: moderate support of being the same person, and +3: strong support of being the same person).

7.3.1.1 Data Analysis

We have developed three studies with the following characteristics:

  1. 1.

    According to the data employed:

    1. (a)

      With all the data.

    2. (b)

      Filtering (removing) the experts with a proficiency lower or equal than 0.5.

    3. (c)

      Filtering (removing) the scenarios (SFO cases) with higher standard deviation (fourth quartile).

  2. 2.

    According to the view of the photographs: frontal versus lateral poses

  3. 3.

    According to the family of criteria: lines, landmarks-soft tissue, outlines, and positional relationship

The statistical analysis developed relied on several concepts that are introduced below together with an example:

  • Cases with decision (CD): the cases in which the expert’s decision is different from 0 (not undetermined)

  • Expert proficiency: the proportion of cases with decision in which the expert evaluated the status of the case correctly

$$ \mathrm{EP}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{CD}}, $$
(7.1)

where TP is the number of positive cases with a positive decision and TN is the number of negative cases with a negative decision, and CD is the number of cases with decision

7.3.1.2 Correlation Between Two Variables

Before computing which are the most relevant criteria, we have calculated the correlation between the status of the identification case and the decision of the forensic expert (correlation-based expert proficiency). Furthermore, we have also estimated the correlation between the value of a criterion and the status of a case (criterion correlation with ground truth). That correlation assesses the tendency of a criterion to have higher values on positive cases and lower values on negative ones.

Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. The Pearson correlation coefficient is the most widely used. It measures the strength of the linear relationship between normally distributed variables. When the variables are not normally distributed or the relationship between the variables is not linear, it may be more appropriate to use the Spearman rank correlation method. Spearman’s coefficient, like any correlation calculation, is appropriate for both continuous and discrete variables, including ordinal variables (Wilkinson 2006). Due to the nature of our dataset, we have applied the Spearman rank correlation method.

The Spearman correlation coefficient is defined as the Pearson correlation coefficient ρ between the ranked variables. For a sample of size n, the n raw scores X i,Y i are converted to ranks x i, y i, and ρ is computed from

$$ \rho =1-\frac{6\sum {d}_i^2}{n\left({n}^2-1\right)}, $$
(7.2)

where d i = x i − y i is the difference between ranks. Identical values (rank ties or value duplicates) are assigned a rank equal to the average of their positions in the ascending order the values.

The sign of the Spearman correlation indicates the direction of association between X (the independent variable) and Y (the dependent variable). If Y tends to increase when X increases, the Spearman correlation coefficient is positive. If Y tends to decrease when X increases, the Spearman correlation coefficient is negative. A Spearman correlation of zero indicates that there is no tendency for Y to either increase or decrease when X increases. The Spearman correlation increases in magnitude as X and Y become closer to being perfect monotone functions of each other. When X and Y are perfectly monotonically related, the Spearman correlation coefficient becomes 1.

We have performed the Spearman correlation with a statistical test in order to estimate the significance of the results. The considered level of statistical significance was 0.05.

For the correlation-based expert proficiency, the specific aim was to test the null hypothesis (H0) stating that the correlation between the decision of the expert and the status of the case are not correlated. In the case of the criterion correlation with the ground truth, the goal was to test the null hypothesis (H0) stating that the correlation between the value of the criterion and the status of the case are not correlated.

As complementary studies, we have added the following analyses:

  • Criterion weighted correlation with ground truth: same as the correlation with ground truth, except that the correlation coefficient associated with each expert is weighted according to his proficiency.

  • Criterion variability: it is computed as the mean standard deviation of the criterion evaluation over the same case. It aims to assess the subjectivity of a criterion.

7.3.1.3 Linear Regression

Correlation makes no a priori assumption as to whether one variable is dependent on the other(s) and is not concerned with the relationship between variables; instead, it gives an estimate as to the degree of association between the variables. In fact, correlation analysis tests for interdependence of the variables.

As regression attempts to describe the dependence of a variable on one (or more) explanatory variables, it implicitly assumes that there is a one-way causal effect from the explanatory variable(s) to the response variable, regardless of whether the path of effect is direct or indirect.

Therefore, we have complemented the latter correlation analysis with a linear regression test.

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. This functional relationship may then be formally stated as an equation, with associated statistical values that describe how well this equation fits the data.

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0).

The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if a point lies on the fitted line exactly, then its vertical deviation is 0). Because the deviations are first squared, then summed, there are no cancellations between positive and negative values.

We have performed the linear regression test in order to estimate if the value of a criterion depends on the status of the case. The goal is to test the null hypothesis (H0) stating that the value of a criterion has not an influence in the status of the case.

7.3.2 Results

With the aim of providing a feasible forum of discussion we have focused the analysis on only one scenario, the 1.a, that is, all the data (participants and CFS cases) are considered at the same time. Additionally, in some parts of the document, we also refer to the second scenario where the data was divided into two different sets according to the view of photograph: frontal and lateral.

Table 7.16 depicts the proportion of cases with decision in which the expert evaluates correctly status of the case (proficiency) and the number of cases given by the expert with a decision different of the zero value. The average performance was poor. In the best case, the rate is just 75% (only two experts). A percentage of 37 (14 experts) of the participants did not overcome 50% of correct answers. The performance is thus worse than previous studies that also involved the SFO stage(Yoshino et al. 1995; Gordon and Steyn 2012; Jayaprakash et al. 2001; Martin and Saller 1957). Possible explanations for this low performance rates are

Table 7.16 Cases with decision and simple expert proficiency
  • The absence of a complete cranium.

  • The quality of some 3D models, which in some cases present noisy parts and artifacts.

  • The materials given to the participants do not include the 3D skull models but just a projection on a 2D plane.

  • The isolation of the decision-making stage given an SFO.

While the negative influence of the first three is quite evident, the fourth is not clear at all. It is the first time that such a study where SFOs are given at hand is developed. The process of overlaying the skull over the face also involves a continuous comparison of the skull-face relationship that is not performed within this study.

Table 7.17 shows the results of the Spearman test in order to calculate the correlation between the status of the identification case and the decision of the forensic expert. The proficiency in Table 7.16 corresponds to the Spearman correlation coefficient. We observe that seven experts achieve a significant correlation between their decision and the status of the case. F20, F21, F22, F23, F31, F34, and F36 have a Spearman positive correlation >0 with a p-value<0.05. In the rest of cases, we cannot reject the null hypothesis of correlation between the decision of the expert and the status of the case; that is, we cannot assert that the decision of the expert is correlated with the status of the case with a confidence level of 95%.

Table 7.17 Spearman tests, correlation-based expert proficiency, cases with p-values <0.05 in bold

Table 7.18 shows the number of times a criterion has been evaluated over the total number of evaluations (each participant for each SFO case), “Usage C.” It also shows the percentage of participants that employed a criterion at least once, “Usage P.” Both statistics are depicted for all the cases (“All”), only frontal cases (“Frontal”), and only lateral/oblique cases (“Lateral”). Those criteria that were employed in less than the 10% of the cases were removed from the corresponding study. In particular, criteria G2.10, G2.26, G3.2, and G3.4 do not reach the required 10% of usage irrespective of the dataset considered (all, only frontal, or only lateral). Criteria that were employed by less than 30% of the participants were also not considered in the corresponding study. The motivation for avoiding these criteria is related to the lack of significance of reduced samples of data.

Table 7.18 Criterion usage

Table 7.19 presents the results of the correlation between the value of a criterion and the status of the case with a p-value ≤0.05, that is, statistically significant results. That correlation assesses the tendency of a criterion to have higher values on positive cases and lower values on negative ones.

Table 7.19 Spearman test, correlation statistically significant between criterion and the status of the case

Eight experts obtained a correlation between one or more criteria and the status of the case with a confidence level of 95%. Hence, we can affirm that the use of some criteria is significantly correlated with the status of the case; that is, some criteria have higher values on positive cases and lower values in negative ones. In most cases, we obtain a positive correlation: when the degree of a criterion increases, the status of the case tends to a positive identification.

The performed regression analysis computes the independence between the value of a criterion and the status of the case.

Table 7.20 depicts the results that have a p-value ≤0.05, that is, those cases that reject the null hypothesis. Thus, shows the criteria that have significant influence on the status of the case.

Table 7.20 Linear regression test, influence statistically significant between criterion and the status of the case

Nine experts obtain a dependency between one or more criteria and the status of the case with a confidence level of 95%. Hence, we can affirm that the use of some criteria is significantly dependent on the status of the case; that is, some criteria have higher values on positive cases and lower values in negative ones.

It is important to note that we have achieved similar results in the correlation and regression tests (Tables 7.19 and 7.20). The criteria that present the most influence in the status of the case are G2.14, G2.17, G2.18, and G3.12.

A boxplot with the expert’s assessment across the scenarios is depicted in Fig. 7.2. This boxplot shows the significant variability within each of the expert’s responses. In general, both negative and positive cases have similar performance rates although a lower variability resulted in the evaluation of the positive ones. While there are only two negative cases (4-2 and 11-1) where most of the participants (≥75%) made a correct evaluation, there are four positive cases with a similar successful evaluation (3-1, 7-1, 13-1, and 18-1). Looking at the median values (black horizontal line inside the boxes), there are three negative cases that were incorrectly assessed by most of the participants: SFO cases 4-1, 15-1, and 16-1. The median values of the other three cases (8-1, 10-1, and 17-1) fall in the undetermined category (value 0). Similarly, there are three positive cases that were incorrectly evaluated by most of the participants: SFO cases 5-1, 5-2, and 12-1. For all these cases, 75% of the participants did not make the correct identification. Differences were not observed between the evaluations of lateral versus frontal views.

Fig. 7.2
figure 2

Statistical representation of the expert’s assessment for each (negative and positive) SFO case. Expert decisions (between −3 and +3) on the y-axis and SFO cases on the x-axis. F and L, in brackets after the number of the case, indicate frontal and lateral view cases, respectively

The subjectivity was measured as the standard deviation of the evaluations (Table 7.21). The standard deviation was computed on each case, and then the values were averaged. The ranges of values for the criteria are within the interval [1, 5] and thus can conclude that there is a significant distribution in the evaluations by the different participants with standard deviations ranging from 0.85 to 1.31.

Table 7.21 Criterion subjectivity

A primary goal for the current study is to provide forensic anthropologists with the means (objective data) to select a set of criteria, or to establish a ranked order of preference, with the most discriminative power and easy to evaluate traits. Figure 7.4 visualizes the standard deviation (related to the ease of objective assessment) and the correlation (related to the discriminatory power criterion).

Three red lines split five groups of criteria with one group split into two separate groups with have six groups in total, which represent the best criteria and the highest discriminatory power with the lowest variability in the top left corner (G3.1 and G4.14). Below this region, the criteria that can be considered easy to evaluate, which are important criteria in terms of discriminative power with high correlation values (G3.10, G2.19, G1.8, and G2.22). In the top right corner, the criteria with almost the highest discriminative power with highest variance are grouped (G3.19).The largest area, more or less in the center of the Fig. 7.3, shows the majority of the criteria that in general are not significantly different (G4.12, G4.1, G2.9, G2.16, G4.7, G1.3, G1.5, and G2.21).Within this region, surrounded by a red-colored circle, we have identified a fifth group composed by criteria with a good trade-off between subjectivity and discriminative power (G3.16, G1.7,G2.4,andG2.23).Finally, the right bottom corner groups the least useful criteria with regards to their subjectivity and do not discriminate between face and skull (G1.2, G2.1, G4.11, G1.6, G2.15, and G2.13).

Fig. 7.3
figure 3

Scatter plot including all the criteria under study spatially distributed according to their subjectivity (x-axis) and discriminative power (y-axis)

Figure 7.4 depicts the differences between the frontal views that cluster specific criteria according to the pose of the person’s face within the photograph.

Fig. 7.4
figure 4

Criterion according to the frontal pose of the person’s face in the photograph

For the frontal cases, five groups can be differentiated. At the top scatter plot, the criteria with highest discriminative power (G3.19) are observed. Below this group, the group with the criteria with a good trade-off between subjectivity and discriminative power is observed (G4.20, G4.5, G1.7, G2.18, G1.8, G4.2, G4.7, G4.1, G2.22, and G2.23). On the left side of the scatter plot, easy to evaluate criterion is depicted (G3.10) with the least amount of variability. In addition, it is an important criterion in terms of discriminative power with high correlation values. In the center of the scatter plot, the majority of the criteria are found showing the least amount of difference among them. Finally, the right bottom region groups the least useful criteria with the highest subjectivity and that cannot be used to discriminate between face and skull (G1.2, G2.1, G1.6, G4.18, G4.11, G2.3, G3.6, G1.1, G2.8, G3.18, G1.4, G2.6, and G4.10).

Figure 7.5 depicts the differences between the lateral views that cluster specific criteria according to the pose of the person’s face within the photograph.

Fig. 7.5
figure 5

Criteria according to the lateral pose of the person’s face in the photograph

For the lateral cases, although they can be grouped into eight separate groups, the two groups in the center part (between correlation values of 0 and 1.2) are considered as part of the same group of criteria with almost no discriminatory power. On the top left corner, the best criteria are represented (G3.1and G4.17), which also have the greatest discriminatory power and least variability. At the top right corner, encloses the group with the greatest discriminatory power and the greatest variability (G4.14 and G3.19). Below group, a group identified (criteria G2.4, G3.11, G2.9, G1.1, G4.10, G2.12, and G2.19) as still important correlation with the identification decisions and a significant variability is shown. Similarly, criterion G3.18 holds important discriminatory power but has a significantly lower variability. As in the other two sections, the central part of the scatter plot contains the majority of the criteria, which do not hold significant correlation values. Finally, the bottom right part of the scatter plot contains the criteria with the greatest subjectivity and that cannot discriminate between face and skull (G4.15, G3.9, G2.13, G4.11, and G4.16). Note that G3.14 refers to the same anatomical correspondence criterion as G3.9 but analyzed on different image views.