Introduction

Facial soft tissue thicknesses (FSTTs) form the quantitative basis of craniofacial identification techniques by providing a metric guide to the depth of the soft tissue envelope that overlies the skull [1,2,3]. In craniofacial superimposition, mean FSTT markers are placed at specific craniometric landmarks to help determine if the skull is a plausible fit to a once living person’s facial contours as recorded in a facial photograph [1]. If the skull is a good fit, the tissue depth markers should align to the skin surface in the superimposed facial photograph, with only negligible differences. In facial approximation, similar applies, however, instead of using an antemortem reference photograph, mean FSTTs are used as a guide to how much soft tissue should be added to the skull to approximate an individual’s face [2, 3]. This applies no matter which facial approximation method is used, including so-called “Russian,” “American,” and “Combination” methods, as all methods, including Gerasimov’s techniques, use mean FSTTs [3,4,5,6,7].

While FSTT means have been criticized because they represent average values [8, 9], these means have always served the intent of providing a general indication to what an individual’s true FSTT might be, rather than exact individualized point estimates free of any error [2, 10]. In other words, their goal is central tendency description of a sample, not estimation of precise values for single individuals. When FSTT means are employed as general guides in the craniofacial identification context, they are used together with a tolerance to account for sampling errors and individual variation. Both the standard error of the mean and the standard deviation provides practitioners leeway to modify mean FSTTs within statistically realistic ranges. These adjustments are often undertaken according to the robustness (or relief) of the skull [2, 3, 11, 12]. As facial approximations are undertaken in the blind, this maneuver tends to inject a degree of subjectivity into the methods. In contrast to FSTT means, regression approaches attempt to tailor estimations more precisely to individuals, typically via craniometric dimensions [8, 9, 13, 14]. However, just like means, these estimates still retain errors that can sometimes be large. In particular, the strength of craniometric correlations is weak, and generally have only been described for samples of small size, limiting their utility [9, 13, 14]. Consequently, arithmetic means continue to hold foundational value for craniofacial identification casework.

Over the past 140 years, >100 FSTT studies have been conducted on adults [15,16,17], with almost all FSTT studies following the same basic principles as established in 1883 by Welcker [18], whereby tissue thicknesses are measured from the skin surface to the most superficial aspect of the underlying bone at cephalometric landmarks [15]. Several measurement techniques have been used to acquire these FSTT data, including, solid-core needle puncture [19], lateral ‘plain film’ cephalograms [20], ultrasound (A- and B-mode) [21], computed tomography (CT) [22], cone beam computed tomography (CBCT) [23], and magnetic resonance imaging (MRI) [24, 25]. Despite the abundance of FSTT studies, individual study sample sizes tend to be small (n ≤ 40). This often applies when overarching large samples are employed (n > 100) as investigators commonly subdivide their samples into smaller subgroups, e.g., by sex, age, and/or ancestry [15, 26, 27].

The representativeness of small-sample FSTT studies is often problematic [28]; however, pooling multiple small-sample studies holds the potential to combat this limitation. One common hesitation to FSTT data pooling is that historically esteemed factors thought to be important for FSTTs (such as sex, ancestry, body position (supine/upright), and/or measurement method) are not separately retained. While categories are sometimes lost, pooled FSTTs tradeoff the often small differences for the benefits of larger sample sizes and increased representativeness under the Law of Large Numbers [16]. This is valuable in the FSTT context because there is no single recognized or agreed-upon gold standard method for FSTT measurement, such that investigators are currently using different methods that all produce slightly different results [29]. Subsequently, statistical noise exists in the FSTT dataset. Pooling the data averages out ‘noise’ either side of underlying ground truth values to produce more accurate means than those from single samples. In other words, under the central limit theorem, data pooling holds the advantage that the distribution of the sample means will increasingly approximate the normal distribution as more studies are included, producing a grand mean that converges on the underlying ground truth value [30, 31]. Since FSTT means are exclusively used as general guides to the typical value of soft tissue over the skull for many individuals, the loss of small differences like those attributable to sex (typically in the order of < 1 mm) with an ‘all-in’ data pool are, for the most part, inconsequential.

An additional benefit of data pooling is that the raw data are not required for the procedure; that is, grand means can be produced in a weighted fashion from just the central tendency statistic and the sample size. These principles drove the first calculation of the tallied facial soft tissue depth tables (commonly referred to as the T-Tables) in 2008 [15, 26]. The T-Tables represent three pooled data tables by age: 0–11 years, 12–17 years, and adults (≥ 18 years). Each 5-year period since their first production the newly published FSTT data have been added to the pooled data and the grand means updated (including weighted rolling means) [15,16,17, 26] (Table 1).

Table 1 Four iterations of the adult T-Tables in the last 15 years (2008–23)

Five years on from the latest iteration of the T-Table [17], additional sample-specific data from 4730 adults have been published, and FSTT data from a further 2978 adults have been extracted from the literature (pre-2018). Collectively, these new data (n > 7700) exceed the starting sample size of the 2008 T-Table (n ≈ 7400) and represent 39% of all FSTT data published to date, making an update to the adult T-Table worthwhile (Table 1). This 2023 T-Table update corresponds to the 140th year anniversary since Welcker’s first FSTT study additionally making it very timely [18].

Methods

Literature searches were conducted for all publications concerning facial soft tissue depth measurements using Scopus, PubMed, and Google Scholar, as well as traditional methods (e.g., reference list searches of relevant articles) to capture all relevant literature. Primary attention was awarded to studies published between 2018 and 2022; however, in the interest of thoroughness, studies from any year that had been missed in previous T-Table versions were also considered. As few new subadults studies were published, only adult studies that reported means and sample size for ≥ 3 landmarks with clear landmark definitions were evaluated. This resulted in 53 new FSTT studies contributing to the 2023 T-Table update, including 33 FSTT studies published between 2018 and 2022 [9, 13, 124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154], and an additional 20 FSTT studies published pre-2018 [104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123] (Table 1). These new data include all six main FSTT measurement methods used, so far, for data collection (Table 2).

Table 2 Data collection methods for studies included in the 2023 T-Table

Despite long-standing standardized nomenclature for craniofacial anthropometry [155], inconsistencies in landmark identification, description, and nomenclature were common across studies [156]. Therefore, landmark clarifications and reclassifications were required for data pooling. For example, some studies used standard landmark names, but non-standard definition(s) (e.g., gnathion name with menton definition), and proximally located landmarks were sometimes confused with one another (e.g., subspinale and mid-philtrum). As also found elsewhere [15, 156], imprecise lay vocabulary was sometimes used in place of technical terminology (e.g., ‘chin’ for either pogonion, gnathion, or menton landmarks depending on the study). In some instances, there was non-standard use of landmark abbreviations (e.g., description of only one landmark from a cephalometric landmark pair), or entirely new formulations for pre-existing landmarks (see [156] for more specific details). Additionally, reclassifications were made for landmarks where the original terms were inappropriate. For example, irrespective of other labels, studies clearly measuring the FSTT directly inferior to the mental symphysis were classed as menton, and FSTT measurements at the deepest point (in profile view) below the anterior nasal spine were classified as subnasale.

After the newly sourced data were added to the 2018 T-Table, weighted grand means and standard deviations were calculated to produce the updated 2023 T-Table values. These statistics were compared to both the 2008 T-Table (first iteration) [15] and the 2018 T-Table (last iteration) [17] to identify any changes. The convergence of FSTT data on stable rolling mean values was investigated for sex separated data by serially pooling study means in sequence of their publication date [16, 17]. As studies are pooled by their publication date, the weighted rolling mean data up to the end of 2017 is the same as the 2018 T-Table [17]—the new data here add to rolling means from 2018 to 2023.

The large volume of cross-sectional data within the 2023 T-Table facilitates the opportunity to examine the pooled data by the measurement method. While this analysis is cross-sectional, meaning different subjects are utilized for each measurement method, the large sample size is advantageous to provide insights into the potential effects of the different FSTT measurement methods. So that measurement method effects were salient, data were summed across landmarks to provide a general indicator of tissue volume across the face per measurement method [29]. The data were analyzed by the following: (1) the new studies identified in this review, (2) only 2018 T-Table studies, and (3) all studies combined. To enable the comparison by method, only those studies that reported data for the same landmarks could be used, so that no method was under-sampled in comparison to any other for any particular landmark. The selection of landmarks was therefore dictated by their commonality (highest n per Table 3). With regard to the median plane, five landmarks were used: glabella (g–gʹ), nasion (n–seʹ), rhinion (rhi–rhiʹ), supramentale (sm–smʹ), and pogonion (pg–pgʹ). With regard to bilateral positions, four landmarks were used: mid-supraorbital (mso–msoʹ), mid-infraorbital (mio–mioʹ), gonion (go–goʹ), and zygion (zy–zyʹ). Only studies that reported data for all five median landmarks or all four bilateral landmarks were included. Authors that reported FSTT data separately for different measurement methods in the same study (Table 2) were included for each method (total n = 146), resulting in a final sample of 93 studies for median landmark analysis and 59 studies for bilateral landmarks (Supplementary Table S1 and S2).

Table 3 Adult 2023 T-Table (≥ 18 Years)

The standard error of the 2023 means were calculated from the standard deviations and sample sizes (s/√n), while the point estimation accuracy for individuals with known FSTT values was tested following [27] using standard errors of the estimate, the v2018.1 C-Table and the TDValidator script [29] (the last two available at CRANIOFACIALidentification.com). As the 2023 data concern only adults, 511 individuals aged ≤ 17 years were removed from the v2018.1 C-Table prior to analysis. Any zero values (included by default for some missing entries in the public C-Table v2018.1) were also removed, so as not to interfere with the TDValidator script calculations. The C-Table test sample subsequently included known FSTTs from 1460 individuals, as contributed by 20 investigator teams from the following studies: [3, 8, 9, 18,19,20, 32, 34,35,36,37,38,39, 51, 56, 67, 71, 91, 95, 128].

Results

The updated 2023 T-Table holds a total of 139 FSTT studies reporting 227,130 tissue thickness measurements for > 19,500 adults at 25 popularly measured landmarks (Table 3). This includes new data corresponding to 81,790 FSTT measurements from 7708 adults in addition to the last T-Table version [17]. The new data additions data represent 39% of the new total available T-Table dataset. The median landmark pogonion (pg–pg′) currently yields the largest sample size, with FSTT data from 17,075 adults (Table 3). The sample sizes for each landmark in this version are now large enough that the standard errors of the mean are approaching zero and thereby signal very high reliabilities (≤ 0.1 mm, Table 3).

T-Table trends: 2008 versus 2023 data

Compared with the original 2008 T-Table, there have been some notable changes in FSTTs, namely, a 3 mm increase at infra second molar (ecm2–iM2ʹ), 2.5 mm increase at gonion (go–goʹ), 2 mm increase at mid-ramus (mr–mrʹ), and 1.5 mm increase at zygion (zy–zyʹ) (Table 4). Only small differences (≤ 1 mm) were evident at all other landmarks. Since 2008, the total sample size for over half of the median landmarks have increased by > 9000 individuals, while only four (of 11) bilateral landmarks have increased by > 5000 individuals (Table 4). The larger data availability for median landmarks is primarily underpinned by the increased number of lateral cephalometric studies that award most attention to the midline and bolster the reliability of these median landmark data (Table 2).

Table 4 Difference between the adult 2008 and 2018 T-Tables with the updated 2023 T-Table

T-Table trends: 2018 versus 2023 data

Compared with the most recent T-Table version (2018), negligible differences in FSTTs (0–0.5 mm) were evident between all median landmarks and most bilateral landmarks, despite the sizeable addition of new data in the 2023 T-Table (Table 4). The exceptions were supra second molar (ecm2–sM2ʹ) and infra second molar (ecm2–iM2ʹ), which increased by 1 mm and 2 mm (respectively), and mid-mandibular boarder (mmb–mmbʹ), which decreased by 2 mm. Since 2018, the sample size for almost all median landmarks has risen by > 5000 individuals, while sample size for most bilateral landmarks have increased on average by approximately 2000 individuals (Table 4).

Weighted rolling means

Line plots of weighted rolling means by year of data publication for all nine median landmarks illustrate convergence on stable mean values (Fig. 1). Early in the sequence when the sample sizes are small (< 2500 individuals), these rolling means exhibit increased movement and interweaving of sex specific means (Fig. 1). As the sample size increases, the means stabilize (underpinned by the Law of Large Numbers and the Central Limit Theorem). In general, males tend to possess larger rolling FSTT means than females; however, this difference is small (< 1 mm) and is unadjusted for body size (i.e., males are on average larger than females so whether or not males are in fact larger once general body size factors are considered is an open question, see [122] for more details). Median FSTT landmarks have been measured more frequently than bilateral landmarks, awarding the former larger sample sizes (Figs. 1 and 2 and Table 3).

Fig. 1
figure 1

Weighted rolling means for adult data at nine common median landmarks. The rolling FSTT mean is shown on the principal axis (left), while the rolling total sample size is shown on the alternate axis (right)

Fig. 2
figure 2

Weighted rolling means for adult data at nine common bilateral landmarks. The rolling FSTT mean is shown on the principal axis (left), while the rolling total sample size is shown on the alternate axis (right)

Line plots of weighted rolling means for three bilateral landmarks (mid-supraorbital (mso–msoʹ), mid-infraorbital (mio–mioʹ), and zygion (zy–zyʹ)) illustrate stabilized mean FSTT trends, with all three landmarks yielding sex specific sample sizes > 3000 individuals (Fig. 2). The most unstable FSTT values are within the cheek region, where rolling means appear to still be in modes of active change (Fig. 2). Six bilateral landmarks have yet to stabilize, with three landmarks (supra second molar (ecm2–sM2ʹ), infra second molar (ecm2–iM2ʹ), and gonion (go–goʹ)) illustrating upward trends, while the other three landmarks (supra canine (sC–sCʹ), mid-ramus (mr–mrʹ), mid-mandibular boarder (mmb–mmbʹ)) display a downward trend (Fig. 2). The sex specific sample sizes for almost all of these landmarks are < 3000 individuals. Females exhibit larger rolling FSTT means at some bilateral landmarks; however, on average, the mean sex differences are very small (< 0.5 mm). Here, it should be noted that these sex trends are based on raw data and again and have not been subject to any body size or scale adjustments as standard in other biological domains [157, 158].

Measurement method

Pooled data at median landmarks

Pooled data for 93 studies at five common median landmarks revealed that lateral cephalograms possessed the highest mean tissue values (38.5 mm), followed by CT (37.5 mm), and then CBCT (36.6 mm) (Fig. 3). Ultrasound yielded a mid-range pooled mean (36.3 mm), while needle puncture generated the lowest pooled mean (33.6 mm). The mean tissue thickness value for MRI (34.3 mm) was closest to the needle puncture method, and slightly lower than ultrasound (Fig. 3). For almost all methods, the sample size exceeded 1000 individuals. The data with the smallest standard deviations were CBCT and ultrasound, while needle puncture yielded the greatest standard deviation (Fig. 3; Supplementary Table S1).

Fig. 3
figure 3

Grand means of FSTT summed across five common median landmarks. Studies contributing to these plots are presented in Supplementary Table S1. Bars represent ± 1 standard deviation of individual study means around the grand mean. The n values give the sample size (averaged across the five landmarks)

When the analysis is broken down by sample (i.e., new data reported in this study compared to the 2018 T-Table data), the 2018 T-Table data generally follows the same trends as those reported above for the full data-suite, with the exception that MRI studies yielded the lowest mean (33.2 mm) compared to other methods (Fig. 4). However, the new data alone exhibited trends divergent from both the full data-suite and the 2018 T-Table data. For example, the new data show ultrasound yielded the highest mean tissue values (41.0 mm), followed by MRI (37.8 mm), and cephalograms (37.7 mm). Similarly, CT (34.6 mm) yielded a comparatively lower value than the 2018 T-Table data, as did needle puncture (19.0 mm) (Fig. 4; Supplementary Table S1).

Fig. 4
figure 4

Grand means for FSTT summed across five common median landmarks by sample. Studies contributing to these plots are presented in Supplementary Table S1. Bars represent ± 1 standard deviation of individual study means around the grand mean. The n values give the sample size (averaged across the five landmarks)

Pooled data at bilateral landmarks

For bilateral landmarks, the largest tissue values for the full data-suite were observed for CT (40.5 mm), followed by CBCT (37.0 mm) and ultrasound (35.2 mm), while the MRI and needle puncture studies yielded the lowest pooled FSTT means (33.7 mm and 31.0 mm, respectively) (Fig. 5). Although CT and MRI are both medical imaging methods where the subject is in supine position, CT and MRI did not yield equivalent pooled means. Despite sample sizes of the pooled data exceeding 500 individuals for both imaging modalities, the CT mean was 6.7 mm larger than the MRI pooled mean (Fig. 5; Supplementary Table S2). Similar to the median landmark data, the smallest standard deviations were observed with ultrasound and CBCT, while the largest standard deviations were observed for needle puncture and MRI (Figs. 5 and 6).

Fig. 5
figure 5

Grand means for FSTT summed across four common bilateral landmarks. Studies used to generate these plots are presented in Supplementary Table S2. Bars represent ± 1 standard deviation of individual study means around the grand mean. The n values give the sample size (averaged across the four landmarks)

Fig. 6
figure 6

Grand means for FSTT summed across four common bilateral landmarks by sample. Studies used to generate these plots are presented in Supplementary Table S2. Bars represent ± 1 standard deviation of individual study mean around the grand mean. The n values give the sample size (averaged across the four landmarks)

Estimation errors of the 2023 T-Table

Performance tests of the newly generated grand means as point estimators for individuals with known FSTTs across 24 landmarks show the 2023 T-Table means produced standard errors of the estimate (Sest) ranging from 1 (rhinion (rhi–rhiʹ)) to 7.7 mm (supra second molar (ecm2–sM2ʹ)), with a grand mean of 3.5 mm (Table 5). This translates to a mean absolute percentage error in the range of 16–79% (grand mean = 30%). Recalculation of the 2018 T-Table [17] standard error of the estimates with the updated v2018.1 C-Table data yielded a grand mean standard error of the estimation of 3.6 mm, indicating the updated 2023 data slightly outperform the older 2018 statistics.

Table 5 Estimation errors of the 2023 adult T-Table means using the v2018.1 C-Table data

Discussion

The collection and analysis of mean facial soft tissue thickness values have been a popular pursuit to assist craniofacial identification methods. Since Welcker published the first FSTT study in 1883 [18], a total of 139 adult FSTT studies have now been published in the literature to collectively tally > 220,000 tissue thickness measurements of > 19,500 adults. In just the last 5 years, a considerable volume of new data has been added. In an effort to leverage this substantial mass of data points to triangulate upon population means, these mean data have been pooled to create the 2023 global tallied facial soft tissue depth table.

T-Table trends: 2008–2023

Since the first adult T-Table was established in 2008 from a dataset comprising ~ 7472 individuals from 55 studies, 84 additional FSTT studies have been published and samples have almost tripled (Table 4). Every 5 years for the last 15 years, an update to the global pooled means has been provided [15,16,17]. A review of these pooled means reveals that the new data, contributed since 2008, have resulted in fairly minimal changes to the starting 2008 summary statistics. Since 2008, only four bilateral FSTT landmarks increased by ≥ 1.5 mm, while all other landmarks changed by ≤ 1 mm (Table 4). In general, this demonstrates that the initial 2008 T-Table data were quite informative, despite the comparatively smaller sample size.

The greatest value of the post-2008 T-Tables resides in their sequentially increasing sample size over time that allows for fluctuations in the rolling means for T-Table landmarks to be evaluated as a time series. In 2018, at just 10 years after the initial analysis, the third iteration of the T-Table did not provide enough time or data to definitively determine if pooled means had converged on population means. Just 5 years on and with substantially more data (81,790 more FSTT datapoints), the additional time window provides a much clearer view as to those patterns. Now, the convergence of rolling means on a constant unchanging value is readily apparent for median landmarks.

When the 2023 T-Table means were used as point estimators for individuals with known FSTT values (C-Table data) across 24 landmarks, the 2023 grand means outperformed the 2018 T-Table means (3.5 mm versus 3.6 mm, respectively) indicating that the updated values are superior (Table 5). While this estimation improvement is marginal (0.1 mm), meaning that for single cases in day-to-day forensic casework the difference is unlikely to be noticeable, in the long run and as applied to many cases, the improved performance of the 2023 T-Table means takes on greater meaning.

The median landmark rolling mean plots demonstrate that a sample size of at least 2500 individuals is generally required to achieve reliably stable pooled FSTT values (Fig. 1). A sample of this size typically requires 30–35 FSTT studies to be combined—thereby highlighting data reliability issues of single small, sampled studies. Some bilateral landmarks have also stabilized, but at sample sizes closer to 3000 individuals (Fig. 2), which is slightly higher than their median landmark counterparts that possess smaller standard deviations (see glabella (g–g′) or rhinion (rhi–rhi′) versus zygion (zy–zy′) or gonion (go–go′), Table 3). The most unstable bilateral FSTT values tend to be found in the cheek region, with six of these landmarks so far still failing to converge on a constant value (Fig. 2). Several factors may be contributing to these trends. These landmarks tend to be some of the largest of the face, and so possess the greatest range between individuals, particularly because they a comprised in part by fatty deposits. In comparison to median landmarks, bilateral landmarks may be more difficult to measure resulting in higher measurement error (or ‘noise’), thereby requiring larger samples to facilitate constant mean values. The continuing upward trend for some landmarks may also be a manifestation of real-world change, for instance, this may be driven by a secular trend such that more contemporary individuals hold larger bilateral FSTTs. It is additionally possible that these upwards trends could be driven by investigator preferences for a particular FSTT measurement method that yields higher values compared to other techniques. For example, both CT and CBCT have gained recent popularity for the measure of bilateral FSTTs (see below and Table 2), however, they appear to yield larger grand mean values compared to other methods (Fig. 5). Only future studies can clarify the underlying root cause for these observed trends.

Measurement method impact

Pooling data by measurement method revealed that the methods do not appear to yield equivalent FSTT values (Figs. 3 and 5). Generally, lateral cephalograms provide the largest values for median landmarks, which is in line with prior observations [15] (Fig. 3). This is likely explained by the X-ray acquisition procedure, which involves adjustments for magnification effects and upright subject positioning [29]. In other methods, such as ultrasound, direct contact of equipment with the soft tissues risks tissue compression that may subsequently yield smaller FSTT measurements [159]. Additionally, supine body position can create thinning down the midline due to soft tissue drape and the weight of the more laterally displaced tissues, thereby yielding thicker soft tissues laterally under these effects of gravity [160,161,162]. As CT yielded the largest values for bilateral landmarks, this may be driven, in part, by the supine subject positioning (Fig. 5). Another important consideration for CT is the resolution provided by the slice thicknesses. When larger slice thicknesses are employed, the CT images possess a better signal-to-noise ratio but poorer resolution, which may decrease measurement accuracy [75, 163]. In the T-Table sample drawn from the literature, slice thicknesses for CT studies were highly varied, ranging from 0.5 to 7 mm. In the future, it would be useful to know exactly how slice thickness settings impact FSTT measurements and what slice thickness settings should be preferred for improved data reliability.

It is interesting to note that despite both CT and MRI being supine non-contact scan methods, MRI consistently yielded smaller FSTT values than CT in the cross-sectional context of this study (Fig. 5). These findings, first observed in [29], may suggest that the technical differences between the imaging techniques produce greater measurement effects than the common supine body positioning. Some evidence may be found for this in a study by Campenelli et al. [164], which compared 3D models of bone generated from segmented MRI and CT data. They found that CT models tended to overestimate bone size compared to 3D laser scans, while MRI models tended to underestimate the bone morphology. Similar findings have also been reported by Rathnayaka et al. [165].

Future work

As stable pooled FSTT means for commonly measured median landmarks have been attained, the most value will be added by new studies that increase samples for bilateral landmarks, so that they too can converge upon stable rolling means.

To reduce the amount of statistical noise in the overarching FSTT dataset, it is worthwhile considering if tighter measurements can be obtained in the future and if these studies should be weighted more heavily during the averaging procedure since they are more trustworthy. This could be obtained through better sampling methods (i.e., attainment of truly random and representative samples) and/or tighter measurement protocols. As previously noted, each FSTT measurement method appears to yield slightly different FSTT values (Figs. 3 and 5), so deciding which method should provide the underlying ground truth standard is a difficult matter (especially since validation checks by direct observations on living subjects are not possible). Setting quality control standards for each data collection method (such as maximum slice thicknesses acceptable for CT data acquisition as mentioned above), though not a comprehensive solution, would be useful. While these standards could be subjectively established by a working committee, they would be better set by quantitative data that show where data accuracies breakdown under certain conditions, and under what conditions reliable data are observed. An easier and less controversial approach to reducing the data noise is simply for investigators to use better data selection strategies that produce representative samples and ensure their sample sizes are sufficiently large to test their hypotheses of interest. As statistical noise cannot ever be entirely eliminated, there is likely to be an ongoing role for data pooling and the T-Tables in the future (Figs. 4 and 6).

To maximize data utility, new FSTT studies should aim to include a base suite of common landmarks that adhere to standardized description and placement [155]. In this regard, the T-Table landmarks form a good minimum set for future investigations since these landmarks have previously been used by many investigators. Investigators can add entirely new landmarks to their studies; however, a standard set should be measured as a basis. Additionally, there is substantial value in the encouraged deposit of raw data into publicly accessible FSTT databases. This can be accomplished, for example, by contributing raw data to the Collaborative Facial Soft Tissue Depth Data Store (C-Table). Such data repositories hold the critical capacity that they can be used for validation testing newly formulated estimation models, these tests can be repeated by other investigators using the same data, and these tests can be conducted at any time since the data repository is free and open access. Currently, the critical step of validation testing newly produced FSTT estimators is rarely undertaken in FSTT studies reporting new estimators [27, 29]. For newly derived FSTT means to offer advances worthy of publication, the standard errors of the estimate must be determined and should be smaller than that of other estimators already published in the scientific literature to add value. Ideally, the validation tests should be conducted on new data not used to derive (or train) the estimators, i.e., they should concern out-of-group tests [27].

An important observation that has previously been made in the literature is the covariation of FSTTs with body mass index (BMI) [88, 122, 129, 138, 166]. This relationship will be important to award increased future attention since the mass component enables relative adjustment of FSTTs with body scale—a standard undertaking in other biological domains [157, 158], but one yet to be realized in the craniofacial identification domain [122]. Rather than treating BMI as a categorical variable for analysis, the body mass should be separately used in its native continuous data format, so that the size of its correlations with FSTTs can be appreciated in detail [122]. All future FSTT research should thereby measure the body mass of each subject in the sample, so that these relationships can be explored. Rather than reducing the mass factor to BMI (kg/m2), body mass should be considered in its native units (kg) as these units hold the stronger correlations with FSTTs [122].

Conclusions

New data corresponding to > 7700 adults have been used to update pooled means and produce the 2023 version of the Global T-Table (total N > 19,500 adults). Rolling means show that the 2023 grand means have converged on stable values at median landmarks, while bilateral landmarks would benefit from continued data collection. Cross-sectional analysis by measurement method indicates that lateral cephalograms and CT provide large FSTTs, while needle puncture provides the smallest values. Within-group validation tests of these updated 2023 Global T-Table values show these means provide slightly more accurate FSTT estimates than the 2018 T-Table data. To maximize the quality and utility of FSTT data, future research should devise optimal data collection strategies that produce less noisy datasets (i.e., reduce measurement and sampling errors) and use the T-Table landmarks as a minimum landmark suite for additional data collection.