Introduction

Stature estimation is one of the most important and basic methods for individual identification as well as for sex and age estimation [1,2,3,4,5,6,7,8,9,10]. Recent forensic anthropology reports have described sex, weight, and age estimation using computed tomographic (CT) images of bones [11,12,13,14,15,16,17,18,19,20,21,22]. Regarding stature estimation, the long bones of the limbs provide the most accurate stature estimation over a wide age range in studies conducted on different races. Among them, the femur is reported as one of the most useful for stature estimation [4, 10, 23,24,25,26,27,28,29,30].

Conventionally, the femur is measured using an osteometric board, which is placed on a horizontal plane [31,32,33,34,35]. In recent reports, the femur was measured using X-ray photography [36,37,38]. Some reports have provided stature estimation using CT images of the femur [39,40,41,42,43,44], and researchers in these studies manually measured the femur on CT images for estimation. However, manual measurement requires a certain level of technical proficiency and can be affected by the performance of the measurer. Thus, using a simpler measurement method than the manual method may provide benefits such as reduction of time and effort required for measurement and prevention of unintentional measurement errors. Herein, we created three-dimensional (3D) reconstructed images from postmortem CT images and measured the femur using a semi-automatic measurement software, with the aim of providing new stature estimation formulae based on these semi-automatic measurements.

Materials and methods

This study included 300 cadavers of known sex and age over 18 that underwent whole-body postmortem CT imaging and subsequent forensic autopsy at the forensic medicine departments at Chiba University and the University of Tokyo in Japan between October 2016 and October 2020. Cadavers with severe decomposition, burn injuries, congenital malformations, postoperative changes, missing parts, femoral fractures, severe deformation of the vertebral bodies, and severe trauma to the head, neck, trunk, or lower limbs were excluded because such conditions have possible effects on the condition of the femur or stature. We included the cadavers of 150 males (10–20 years, n = 1; 21–30 years, n = 21; 31–40 years, n = 19; 41–50 years, n = 37; 51–60 years, n = 38; 61–70 years, n = 20; 71–80 years, n = 12; 81–90 years, n = 2) and 150 females (10–20 years, n = 13; 21–30 years, n = 12; 31–40 years, n = 28; 41–50 years, n = 24; 51–60 years, n = 21; 61–70 years, n = 18; 71–80 years, n = 20; 81–90 years, n = 14). Cadaver stature was measured in the supine position before autopsy using a measuring tape or a ruler. The adjusted stature (AS) was calculated by subtracting 2.0 cm from the measured stature to obtain an estimate of the living stature according to previous studies [45,46,47,48].

At Chiba University, postmortem CT was performed using a 64-row detector CT system (Supria Grande; Fujifilm Healthcare Corporation, Tokyo, Japan), and the scanning protocol was as follows: tube voltage, 120 kV; tube current, 250 mA; scan time, 0.75 s; collimation, 0.625 mm. The slice thickness, reconstruction interval, and field of view during image reconstruction were 1.0, 0.725, and 500 mm, respectively. At the University of Tokyo, postmortem CT was performed using a 16-row detector CT system (ECLOS; Fujifilm Healthcare Corporation), and the scanning protocol was as follows: tube voltage, 120 kV; tube current, 200 mA; scan time, 1 s; collimation, 1.25 mm. The slice thickness, reconstruction interval, and field of view during image reconstruction were 1.25, 1.25, and 500 mm, respectively.

Image data were processed on a workstation (Synapse Vincent; Fujifilm Medical), and a semi-automatic application was used to measure the femur. Just after launching, this application automatically recognizes the femur and displays it as a reconstructed 3D image. If it contains other structures, such as calcified blood vessels or cartilage, manual adjustments are necessary. After confirmation that the reconstruction is appropriate, the bone surface information is automatically extracted with a single click. By manually marking the four points—the center of the femoral head, intercondylar notch (ICN), medial epicondyle, and lateral epicondyle—on the model (Fig. 1), 41 measurements are automatically calculated and displayed (Table 1). The time required from manual marking to displaying the results was approximately 40 s. Using the results of each cadaver, the average values of the right and left femurs were also calculated (Fig. 2).

Fig. 1
figure 1

Four points where manual marking are necessary (each picture shows one point in the horizontal, coronal, and sagittal planes and the three-dimensional reconstruction of the computed tomography images). a Center of the femoral head: the central point of the femoral head. b Intercondylar notch: posterior 1/4 point on the midline of the recess located between the medial and lateral condyles on the bottom surface of the lower end of the femur. c Medial epicondyle: the most medial point of the medial condyle. d Lateral epicondyle: the most lateral point of the lateral condyle

Fig. 2
figure 2

Five measurements with acceptable intraobserver and interobserver errors. a Maximum length of the femur (MLF). b Lateral anterior–posterior length (LAP). c Cross-section medial–lateral width (C-ML). d C-lateral anterior–posterior length (C-LAP). e C-medial anterior–posterior length (C-MAP)

Table 1 Definition of measurements

First, to select measurements with acceptable intraobserver and interobserver errors, 20 cadavers were randomly selected. To evaluate the intraobserver error, a single researcher measured the femurs twice with an interval of ≥ 1 day for each cadaver. To evaluate the interobserver error, another researcher measured the femurs, and the result was then compared with the first result from the first researcher. The intraobserver and interobserver errors were assessed with the technical error of measurement (TEM), relative technical error of measurement (rTEM), and coefficient of reliability (R) [49, 50]. The acceptance range for rTEM was set at < 1.5% for intraobserver error and < 2.0% for interobserver error [51].

Second, the sexual differences in age, AS, and acceptable measurement were evaluated. If these values followed a normal distribution, Student’s t-test was used. If the values did not follow a normal distribution, the Wilcoxon rank sum test was used instead [52, 53]. The absolute z values of skewness and kurtosis were used to assess normal distribution [54].

Lastly, the relationship between AS and each measurement for all 300 cadavers was assessed using single regression analysis with the statistical values of the coefficient of determination (R2) and the standard error of the estimate (SEE). In this analysis, all manual markings were performed by a single researcher. A residual plot was created with the predicted stature calculated with the obtained regression equation, and the difference between the predicted stature and AS and the existence of heteroscedasticity was examined [55].

Statistical significance was set at P < 0.05 to reject the null hypothesis that there was no significant difference in statistical values between males and females and that the regression coefficient was 0. Statistical analysis was performed using Excel 2010 (Microsoft Corporation, Redmond, WA, USA).

Results

The 41 measurements were classified into groups 1 and 2 based on the results of intraobserver and interobserver errors (Table 2), and the TEM, rTEM, and R values for each measurement of both the right and left femurs are shown in Table 3. Group 1 included measurements with rTEM values < 1.5% for intraobserver error and < 2.0% for interobserver error on both the right and left sides. Group 2 included the other measurements whose rTEM values for intraobserver or interobserver errors were larger than the acceptable range. Group 1 comprised five measurements: maximum length of the femur (MLF), lateral anterior–posterior length (LAP), cross-section medial–lateral width (C-ML), C-lateral anterior–posterior length (C-LAP), and C-medial anterior–posterior length (C-MAP), for which R values were > 0.9. Group 2 was classified into groups 2–1 and 2–2 according to measurement type. Group 2–1 included measurements for angles and curvature radii, and group 2–2 included measurements for length.

Table 2 Semi-automatic measurement classification
Table 3 TEM, rTEM, and R values for each measurement of both the right and 1 left femurs (n = 20)

The descriptive statistics for age, AS, and five group 1 measurements are presented in Table 4. Age, AS, MLF, LAP, C-LAP and C-MAP followed a normal distribution, while only C-ML did not follow a normal distribution. There was no significant difference in mean age between the sexes (P = 0.482). The mean values of AS and of each measurement were significantly greater in men than in women (C-ML, P < 0.01; AS, MLF, LAP, C-LAP, and C-MAP, P < 0.001).

Table 4 Descriptive statistics for age, AS, and group 1 measurements

Table 5 describes the result of the single linear regression analysis for estimating AS using five group 1 measurements for all cadavers, regardless of sex. Tables 6 and 7 show the results for males and females, respectively. Significant positive correlations were observed between the AS and each measurement. MLF had the strongest correlation and the lowest SEE for all cadavers, while LAP had the second strongest correlation and lowest SEE. Figures 3, 4, and 5 show the residual plots for the five measurements.

Table 5 Simple linear regression analyses for stature estimation for all samples regardless of sex
Table 6 Simple linear regression analyses for stature estimation in males
Table 7 Simple linear regression analyses for stature estimation in females
Fig. 3
figure 3

Residual distribution for all samples regardless of sex with the five measurements. a1: Right MLF (maximum length of the femur): a2: left MLF, a3: average MLF; b1: right LAP (lateral anterior–posterior length): b2: left LAP, b3: average LAP; c1: right C-ML (cross-section medial–lateral width): c2: left C-ML, c3: average C-ML; d1: right C-LAP (C-lateral anterior–posterior length): d2: left C-LAP, d3: average C-LAP; e1: right C-MAP (C-medial anterior–posterior length), e2: Left C-MAP, e3: average C-MAP. AS, adjusted stature, PS, predicted stature calculated with the obtained regression equation

Fig. 4
figure 4

Residual distribution for male samples with the five measurements. a1: right MLF (maximum length of the femur): a2: left MLF, a3: average MLF; b1: right LAP (lateral anterior–posterior length): b2: left LAP, b3: average LAP; c1: right C-ML (cross-section medial–lateral width): c2: left C-ML, c3: Average C-ML; d1: right C-LAP (C-lateral anterior–posterior length): d2: left C-LAP, d3: average C-LAP; e1: right C-MAP (C-medial anterior–posterior length): e2: left C-MAP, e3: average C-MAP. AS, adjusted stature, PS: predicted stature calculated with the obtained regression equation

Fig. 5
figure 5

Residual distribution for female samples with the five measurements. a1: right MLF (maximum length of the femur): a2: left MLF, a3: average MLF; b1: right LAP (lateral anterior–posterior length): b2: left LAP, b3: average LAP; c1: right C-ML (cross-section medial–lateral width): c2: left C-ML, c3: average C-ML; d1: right C-LAP (C-lateral anterior–posterior length): d2: left C-LAP, d3: average C-LAP; e1: right C-MAP (C-medial anterior–posterior length): e2: left C-MAP, e3: average C-MAP. AS, adjusted stature; PS, predicted stature calculated with the obtained regression equation

Discussion

In this study, we obtained stature estimation formulae based on a 3D model reconstructed from CT images using semi-automatic measurement software. This is the first report that obtained stature estimation formulae from measurements in 3D CT-reconstructed images using semi-automatic measurement software. In the present study, artificial intelligence (AI) was used for recognition of the femur, extraction of bone surface information, and semi-automatic measurement. AI has been applied in multiple fields of medical research. In the field of diagnostic imaging, it has been shown to reduce not only time for analysis but also interreader variability or false-positive markings [56,57,58,59]. Furthermore, AI has been shown to improve adenoma detection rates and reduce examination time in colonoscopy [60], thus reducing waiting time for outpatients [61] and the time interval between CT angiography at a primary stroke center to door-in at a comprehensive stroke center [62]. In the present study, the advantages of using a semi-automatic measurement software were the following: it is a simple measurement method; the time required for measurement is short (approximately 1 min); multiple measurements can be obtained with a single method.

Previously, some stature estimation methods with a single linear regression analysis from MLF measured using radiographic images were reported (Table 8). In two previous reports that presented intraobserver and interobserver errors [40, 44], the rTEM values for intraobserver errors were 0.108–0.277 and those for interobserver errors were 0.192–0.289. In this report, the rTEM values for intraobserver errors were 0.034–0.035 and those for interobserver errors were 0.018–0.019, which were lower than in these two reports. It is possible that these errors were reduced using semi-automatic measurement software.

Table 8 Outline of previous and present studies having performed simple linear regression analysis for stature estimation with maximum length of the femur whose length was measured using radiographic images

Compared with previous reports [37, 40, 42, 44] of Japanese cadavers, the results of R2 and SEE in this study were either better or at least not inferior; therefore, the stature estimation formulae determined in this study could be useful in forensic medical practice. Compared with previous reports providing stature estimations using CT images of Japanese femurs [40, 42, 44], the present study observed the lowest SEE in males, whereas the SEE in females was the second lowest after Chiba et al. [44], and the difference was < 0.2 cm. In their report, MLF was manually measured by reproducing the conventional anthropological measurement method using a CT arbitrary cross-section reconstruction image. Although it may be highly applicable to conventional bone measurements, their measurement method is complicated and time consuming, taking approximately 140 s for measuring MLF, and approximately 440 s for measuring the 5 measurements needed for single side written in the research [44]. In contrast, the semi-automatic measurement method examined in this study is much simpler and faster. It took approximately 40 s from manual marking to displaying 41 measurements, and approximately 280 s from launching this application to displaying all the results. This time period includes measurements of both sides of the femur and includes the time required for 3D model reconstruction. Since the semi-automatic measurement method reduced the measurement error and shortened the measurement time, it is expected that if a fully automatic measuring method is developed, it will be possible to measure with smaller errors and shorter measurement time than the results of this study currently show.

Hasegawa et al. [37] showed lower SEE values in females than those observed in this study (difference, > 0.3 cm), and their report showed the best results in terms of SEE in Japanese subjects, as shown in Table 8 [37, 40, 42, 44]. However, the SEE in males was slightly higher than that observed in males in this study. In addition, the difference in SEE between males and females was 0.74 and 0.83, which was greater than difference in this study (0.003 and 0.072). Hasegawa et al. [37] provided stature estimation formulae using an X-ray photograph of a living human. The difference between this report and theirs might be because their patients were alive, the radiation imaging device was different, and the number of female samples was higher than that of the male samples in their study.

Comparison of the present study with those of Zhang et al. [63] and Lee et al. [39] is complicated because the subjects are different, but our results were superior to those of Zhang et al. [63] and slightly inferior to those of Lee et al. [39]. Zhang et al. [63] studied a smaller number of cadavers than this study; therefore, the difference might be due to the sample size. Meanwhile, Lee et al. [39] had more cadavers with age of 41–60 years (65.8% for men and 45.1% for women) than our report (50.0% for men and 30.0% for women). They might have obtained better results of stature estimation formulae than this study, whose age groups of cadavers were scattered because their stature estimation formulae were adapted to the age groups that comprised most of their cadavers. The difference in age composition ratio, CT equipment, or image reconstruction software may have affected the results.

Many reports have shown that MLF is useful for stature estimation, consistent with our finding that stature estimation with MLF showed the best performance. However, it is impossible to measure MLF if only part of the femur remains. In this study, stature estimation using LAP showed the second lowest SEE. This suggests that LAP would be useful for stature estimation if the MLF cannot be measured, for example, if only the lower part of the femur remains. Although some reports provided stature estimation formulae using measurements of the lower part of the femur [29, 44, 64, 65], no report has suggested that LAP is useful for stature estimation. The high values of SEE for LAP and the three measurements, C-ML, C-LAP, and C-MAP, were not negligible. However, of all the studies that performed stature estimation using the measurements of the lateral side of the femur, only Chiba et al. [44] calculated SEE. They reported that the SEEs calculated from femoral epicondylar breadth (linear distance between projection points of the most medial and lateral epicondyles projected vertically to the horizontal) was 5.620–6.300. Compared to their study, SEEs calculated from LAP showed better results, and SEEs calculated from other measurements were not inferior. Since there are few comparison targets, further research on stature estimation using the measurements of the distal part of the femur is desirable in future studies.

Among the 41 measurements that were semi-automatically measured in the present study, group 2 measurements had large intraobserver and interobserver errors outside the permissible range. Descriptive statistics for the measurements corresponding to group 2 are shown in Online Resource 1. There are several possible reasons for the higher measurement errors in group 2 measurements. Group 2–1 measurements were based on information from the edge of the reconstructed 3D CT model. Therefore, the slight difference in construction due to the manual removal of calcified blood vessels and cartilage might have resulted in a large error. Group 2–2 measurements, except MAP, had smaller values than those of group 1, as shown in Table 4 and Online Resource 1. Therefore, the error caused by manual operation might have had a significant influence on these measurements. MAP had similar values to those of group 1 measurements, but it also had higher measurement errors. Unlike C-ML, C-LAP, and C-MAP, MAP is measured without creating a cross section at the lower part of the femur. The deformation of the knee joint, including the distal end of the femur, might have occurred in most of the cadavers in this study because primary knee osteoarthritis often occurs in people over 50 years old [66, 67]. This change may have made it difficult for the AI software to have identified them. In some cadavers, the software used in this study mistakenly recognized the knee cartilage and patella as part of the femur when it identified the femur, and the structure other than the femur had to be manually removed. This manual operation might have caused higher measurement errors. In addition, MAP had a larger measurement error than LAP, which was also measured without creating a cross section. This may be because osteoarthritis occurs more frequently on the medial side than on the lateral side [64].

The residual plots indicated that the two measurements, MLF and LAP, were good models for calculating regression equations. The other three measurements were difficult to adopt for the regression equations, because of the large outliers and a small range of predicted values, especially in the residual plots using single-sex. This may be attributed to the small range of the three measurements.

This study has several limitations. The measurements useful in other reports, such as the femoral diaphysis length, physiological length, or bicondylar length [13, 29, 65, 68, 69], were not measured because the semi-automatic measurement application was not configured to measure them. Furthermore, the application was developed by Fujifilm, including measurements selection. The femurs measured in this study were collected only from cadavers with soft tissue, so further studies examining the difference between digital and analog measurements are warranted. Femur deformation due to aging was not considered. In this research, the stature of the cadavers was recalculated in AS, and the estimation formulae were assessed, but since the actual stature was measured only once, intra- and inter-observer errors were not evaluated for the stature. Age-stratified analysis was not performed because of the insufficient sample size in this study. In addition, this study was performed using images captured with two types of CT equipment. Further studies comparing and examining images acquired with different CT devices are warranted.

Conclusion

This study provided the first stature estimation formulae based on a 3D CT model of modern Japanese femurs using a simple and rapid semi-automatic measurement software. For stature estimation with this method, MLF was the best, and LAP was the second-best measurement using 41 total measurements. These formulae can be useful in forensic investigations.