Introduction

Age estimation in living individuals is important for clinical applications [1, 2], in legal or forensic investigations [3, 4] and sports [5, 6], but is prone to uncertainty caused by the variation of human development [7]. A living person’s chronological age is derived from their biological age, which is an active topic of current research [8]. Currently, particularly forensic age estimation receives wider attention due to the ongoing flow of individuals into and across the European Union, since it is legally necessary to determine whether individuals without valid identification documents, who claim to be minors, have reached the age of majority.

As recommended by the work group for forensic age diagnostics [4], imaging-based multi-factorial age estimation methods involve a radiograph of the hand [9, 10], a panoramic X-ray of the teeth [11] and computed tomography (CT) images of the clavicles [12]. The application of ionizing radiation associated with these imaging modalities prompted numerous studies to investigate magnetic resonance imaging (MRI) for its potential to replace the currently applied imaging techniques for the hand [13,14,15], teeth [16,17,18] and the clavicles [19,20,21], and to identify new age-relevant body regions [22,23,24].

Compared to CT and X-ray imaging, however, MRI generally requires considerably longer acquisition times. This leads to increased examination costs and reduced patient comfort and gives rise to potential errors due to motion artefacts when acquiring images of children or adolescents. Hillewig et al. were the first to address this problem with regard to forensic age estimation proposing a four-minute approach for MRI acquisitions of the clavicles [25] by comparing different MRI sequences and identifying the best compromise between acquisition time and image quality. Latest developments in MRI, however, allow to further reduce acquisition time by applying undersampling strategies. For age estimation, the first results have been reported by Terada et al. [26] and Neumayer et al. [27] for accelerated images of the hand and wrist. In the current work, we extend this approach to image data of the clavicles and the teeth, as these are the three anatomical structures required in the widely used multi-factorial age estimation scheme recommended by the work group for forensic age diagnostics [4] and have been proven to be most useful in majority age classification [28, 29].

For this purpose, we retrospectively undersampled MRI acquisitions of the left hand, the clavicles and the wisdom teeth and applied radiological and automatic age estimation methods to determine limits of acceleration that can be applied to MRI data without considerably influencing the outcome of the respective age estimation technique.

Materials and methods

Study design

The study was performed in accordance with the Declaration of Helsinki and was approved by the ethical committee of the local medical university. All eligible volunteering participants provided written informed consent; for underage participants, written consent from the parents was obtained.

Subjects

For this feasibility study, 34 healthy male Caucasian volunteers between 13.37 and 24.05 years (mean 17.15 years, median 16.89 years, standard deviation 2.87 years) were recruited to acquire three-dimensional MR images of the left hand and wrist, the clavicles and the teeth. For one additional volunteer (19 years), whose image data was not used for retrospective undersampling, we additionally acquired images with undersampling factors 4, 2 and 6 for the hand, the clavicles and the teeth, respectively.

MR acquisitions

MRI exams were performed using clinical 3T MR scanners (Skyra/Prisma, Siemens Healthineers, Erlangen, Germany).

Three fully sampled acquisitions formed the basis for our study:

•Hand: T1-weighted 3D VIBE, TE/TR/FA = 4.06 ms/14 ms/15°, field-of-view (FOV) = 129 mm × 230 mm, 2 averages, acquisition matrix = 129 × 230 and image matrix = 288 × 512, 72 slices, image resolution 0.45 mm × 0.45 mm × 0.90 mm, acquisition time TA = 3:46 min.

•Clavicles: T1-weighted 3D VIBE, TE/TR/FA = 3.72 ms/9.77 ms/12°, FOV = 149 mm × 170 mm, 2 averages, acquisition matrix = 168 × 192 and image matrix = 224 × 256, 44 slices, resolution 0.90 mm × 0.90 mm × 0.90 mm, TA = 5:29 min.

•Teeth: T1-weighted 3D TSE, TE/TR/FA = 12 ms/254 ms/150° (refocussing), TF = 4, FOV = 103 mm × 150 mm, acquisition matrix = 176 × 256 and image matrix = 352 × 512, 56 slices, image resolution 0.30 mm × 0.30 mm × 1 mm, TA = 10:46 min.

For acquisitions of the hand and wrist, volunteers were placed in prone position with outstretched left arm and a sandbag placed on top of the hand to minimize movements and using a conventional 20-channel receive-only head-neck coil (Siemens Healthineers, Erlangen, Germany).

Images of the clavicles and the teeth were acquired in supine position, using a 4-channel neck coil (Siemens Healthineers, Erlangen, Germany) and an 8-channel multifunctional coil (CPC, Noras MRI products GmbH, Höchberg, Germany), respectively.

Retrospective undersampling of MRI data

The principle of undersampled MRI exploits the redundancy of image information for acquisitions with multiple coil elements. This redundancy allows to acquire a smaller number of data lines than is required for a fully sampled data set. The missing information is then recovered by applying algorithms either based on parallel imaging [30, 31] or compressed sensing [32]. The resulting speed-up is referred to as the acceleration factor (AF), which is defined by the number of acquisition lines required for a fully sampled data set divided by the number of acquisition lines of the undersampled data set (e.g. acquiring only half of the data corresponds to an acceleration factor of AF = 2).

In contrast to standard image data provided by MRI scanners, MRI raw data—available at the scanner for a limited time after the scan due to its extensive storage requirements—includes the entire, unedited and non-combined data of each coil element. This allows removing data lines from a fully sampled raw data set prior to image reconstruction, which is equivalent to not collecting these lines during image acquisition. Therefore, retrospective undersampling of MRI raw data is a valid simulation of actual, undersampled acquisitions and additionally allows a comparison with the fully sampled data for the same subject in the same position. Since undersampling is performed retrospectively in this study, the prospective acquisition time of undersampled data will be termed theoretical acquisition time (TA,th) throughout this paper.

In this study, we generally followed the approach for images of the hand proposed in [27]: Retrospective undersampling of raw MRI data was performed by applying the commercially available CAIPIRINHA (controlled aliasing in parallel imaging results in higher acceleration) acquisition strategy [33] using the AVIONIC toolbox [34]. Coil sensitivities were estimated applying the ESPIRiT method using the BART toolbox [35] and image reconstruction was carried out using total generalized variation (TGV), which considers piecewise smooth intensity variations [36]. Only non-averaged data were undersampled; this reduced the required theoretical acquisition time by a factor of two, compared to the standard setting of performing two averages for acquisitions of the hand and wrist and the clavicles. For readability, images reconstructed from fully sampled MR data will be addressed as original images or data and images reconstructed from retrospectively undersampled data will be termed undersampled for the remainder of this paper.

Image data of all three body regions were undersampled according to Tables 1 and 2 for radiological and automatic age estimation, respectively. The applied acceleration factors for the hand were based on existing work, while for the clavicles and the teeth the degree of acceleration was chosen considering limiting factors for our method. For the analysis of the hand, radiologists were presented images undersampled with acceleration factors 4 and 8, based on the results of [27]. To keep the effort in reasonable bounds, radiologists and dentists were presented original and three undersampled data sets per volunteer for the clavicles and the teeth. The maximum AF for the clavicles was chosen to be 4 (the number of available channels); for the teeth maximum AF was set to 6 (slightly below the channel number due to coil arrangement). Automatic age estimation evaluated a larger set of undersampled image stacks up to an AF of 16 for the hand and an AF of 9 for the clavicles and the teeth. It has to be noted that undersampling strategies require an additional acquisition of a small number of calibration lines. Therefore, the actual speed-up will always be below the defined AF value; the actual acceleration factor is reflected in the resulting theoretical acquisition times in Tables 1 and 2.

Table 1 Acceleration factors applied for radiological age estimation. A value of AF = 1 designates original acquisition times. Theoretical acquisition times for acquisitions of the hand and clavicles are additionally halved by only using non-averaged data
Table 2 Acceleration factors applied for automatic age estimation. A value of AF = 1 designates original acquisition times. Theoretical acquisition times for acquisitions of the hand and clavicles are additionally halved by only using non-averaged data

Skeletal rating

Skeletal age was rated independently using two different approaches: application of (i) radiological methods by raters with the respective expertise and (ii) an automatic method based on deep convolutional neural network (DCNN) architectures for age estimation.

For images of the hand, radiologists applied the method proposed by Greulich and Pyle [9] (GP), which was originally based on radiographs but was recently verified for applicability [37] and reliability [27] when used for MR images. For the clavicles, nine different developmental stages were assigned as already performed on MR images in [19] and stages of teeth development were assessed as defined by Demirjian [11]. To avoid biased age estimates, MR images were anonymized and randomized irrespective of the acceleration factor. All raters were instructed to provide ratings only in clear cases, i.e. when an unambiguous assignment of a stage was possible. This further defined assessability of the data sets: the absence of a rating was tantamount with the data set being not assessable.

Given the anatomical differences between data sets, radiological assessment was performed by several raters to benefit from the specialisation of each evaluator. A paediatric radiologist with 6 years of experience in bone age estimation (R1) evaluated images of the hand. An oral and maxillofacial surgeon in training, with specific expertise in head and neck imaging and forensic odontology specially trained for the evaluation of the clavicles and with more than 7 years of experience in this field (R2) assessed images of the clavicles. A radiologist with more than 7 years of expertise in forensic applications (R3) evaluated images of the hand and the clavicles. A dentist with 10 years of experience in radiological evaluation of MRI data and 9 years of experience in age estimation (R4) and a specialist in oral surgery and oral radiology performing age estimations in the daily routine with 13 years of experience (R5) assessed images of the teeth. Due to the challenging aspects of MR images of the clavicles, a forensic anthropologist (R6) was appointed as a third evaluator for this data set, and raters R2 and R3 evaluated the original images a second time.

Automatic skeletal age estimation was performed using the fully automated method recently proposed by Štern et al. [28]. This method was evaluated on 322 data sets of subjects different to our cohort, but acquired with the same MRI protocol and provided a mean absolute error (MAE) of 1.01 ± 0.74 years (MAE ± standard deviation).

Statistical analysis

Our focus in this study was on the reliability of multi-factorial age estimation with decreasing acquisition time instead of the absolute agreement with chronological age. For this purpose, we analysed the change introduced into the estimated age (automated), age category (hand) or developmental stage (clavicles and teeth) with increasing acceleration factor as proposed in [27]. As an estimator for this variation, we calculated the difference between the age/age stage estimated from original data (Ageorig) from the age/age stage estimated from undersampled data (Ageus):

$$ \varDelta \mathrm{Age}={\mathrm{Age}}_{us}-{\mathrm{Age}}_{orig} $$

For simplicity, ΔAge is used for both age differences and differences between estimated stages. The standard deviation of the signed differences (SSD) of ΔAge was used as a measure for the reliability of the age estimation, the mean of signed differences (MSD) served to identify potential systematic errors. Intra-class correlation (ICC) was calculated between age estimates based on original images and the estimates from undersampled data sets for each rater. Additionally, ICC and overall Bland-Altman mean (μBA) and limits of agreement (LOA) between raters were determined.

Best-performing combinations of undersampled data

To analyse the acceleration potential of MR acquisitions of each of the three body regions, all available data sets were combined in all valid compositions, i.e. one data set of each body region per volunteer in all combinations of available acceleration factors (see Table 2), to retrieve corresponding age estimates.

The reliability of all age estimates was analysed to identify the combinations that provide the best reliability while requiring the shortest possible TA,th. Besides reliability, agreement with chronological age was also investigated for selected combinations.

All statistical analyses were performed using MATLAB (R2017b, The MathWorks Inc., Natick, MA, USA).

Results

Available data

Two acquisitions of the hand were excluded from the evaluation due to strong artefacts (radiofrequency-based, motion) in the images. Additionally, MR raw data could not be obtained for one acquisition of the teeth. This ultimately resulted in 96 data sets of the hand, 136 data sets of the clavicles and 132 data sets of the teeth assessed for radiological age estimation.

Image reconstruction and image quality

Figure 1 shows representative images of central slices of all three acquired data sets of one volunteer (13.8 years) for the original data set and undersampled images for AF= [4, 6, 8].

Fig. 1
figure 1

Exemplary original and undersampled images of all three body regions for one volunteer (13.8 years)

Generally, the reduction of available data for image reconstruction due to undersampling leads to smoothing of image details and suppression of image noise. These effects are observable in all data sets: With increasing AF, the visible part of the sternum body becomes perceptibly smoothed in images of the clavicles; for hand images, an overlap of the muscle tissue with metacarpal bones can be observed and for high acceleration factors details of single teeth become reduced.

Figure 2 shows a comparison of an original image with an actual accelerated acquisition (AF = 4; however, using an acquisition strategy different from the acquisitions described in the Methods section) of the same volunteer (17.75 years) in consecutive scans. The arrow in both images marks an open epiphyseal gap, which appears partially closed in the original image.

Fig. 2
figure 2

Comparison of an original with an actual accelerated acquisition of the clavicles for one volunteer (17.75 years). Arrows mark an open epiphyseal cartilage clearly visible in the accelerated acquisition but appearing partially ossified in the original scan

Assessability of reconstructed MR images

Overall assessability for hand images was 100% and roughly 60% (R2: 76%, R3: 70%, R6: 37%) for the clavicles (assessability was higher when the second assessment of the original images is taken into account; R2: 77%, R3: 75%). For teeth images, assessability was around 90% (R4: 91%, R5: 86%; see Table 3 for details). The automated method took all images into account for age estimation.

Table 3 Assessability for all data sets using radiological evaluation. Parentheses mark results from a second evaluation

Reliability of ratings

As an exemplary visualisation for all evaluations, Fig. 3 shows the results of the radiological assessment of hand images by raters R1 and R3. Figure 3a, b shows differences to original age estimates separately for both raters and Fig. 3c shows a Bland-Altman plot comparing estimates of both raters (see Table 4 for all results).

Fig. 3
figure 3

Difference in age estimates of undersampled hand images compared to estimates based on original images for a R1 and b R3. c Bland-Altman plot comparing age estimates of both raters (larger markers denote multiple data points at the same position)

Table 4 Reliability for all data sets using radiological evaluation

With the automatic age estimation method, we evaluated the reliability of 441 different data set combinations. A scatter plot showing all differences to original estimates versus total theoretical acquisition time is shown in Fig. 4a. In Fig. 4b, values of SSD of each of the 441 combinations is plotted versus total TA,th showing that SSD values were below 0.90 for all combinations. The lower contour of this scatter plot marks data set combinations providing minimum SSD for shortest possible theoretical acquisition times. All combinations of the lower contour yielding SSDs of a maximum of 0.2 years are summarized in Table 5. Regarding agreement with chronological age, the automatic method yielded SSD/MSD = 0.88 years/0.19 years for original images and SSD/MSD = 0.94 years/0.40 years for the last entry in Table 5. Figure 5 shows a comparison of original images and undersampled images acquired with acceleration factors of 4, 2 and 6 for the hand, the clavicles and the teeth, respectively, leading to a total acquisition time of roughly four minutes.

Fig. 4
figure 4

Results of automatic age estimation. a ΔAge for all volunteers and data set combinations and b SSD for all data set combinations over total theoretical acquisition time. The lower contour marks best-performing combinations

Table 5 Best-performing combinations of acceleration factors using the automatic age estimation method
Fig. 5
figure 5

Comparison of original images and actual accelerated acquisitions with a total acquisition time of roughly 4 min. Note, that the real acquisition times differ slightly from the theoretical acquisition times due to scanner restrictions

Discussion

In this study, we analysed the reliability of multi-factorial age estimation based on undersampled MRI of all three regions as recommended by the work group for forensic age diagnostics [4]. Radiological analyses showed that a reduction to a total theoretical acquisition time between 4 and 5 min was feasible; the automatic method applied in this study provided reliable results for even shorter acquisition times. This is valuable information since comparable MRI-based age estimation studies use acquisition times of 6 min [15, 38] for the hand and 4 [20] to 6 min [5] for the clavicles. MRI systems with low field strengths can benefit from the fact that lower field strengths lead to shorter values of T1, which in turn permit shorter repetition times TR required for the acquisition of one image line. This allows to reduce acquisition time compared to high-field MRI scanners; however, even for such systems, reported acquisition times for hand MRI still range between 1:40 min [13] and 2:44 min [14] leaving room for optimisation. Additionally, the automatic analysis allowed to derive a list of acceleration options providing best reliability for a given amount of time using an entirely objective method evaluating the images’ suitability for age estimation. Our approach can easily be adopted, since CAIPIRINHA is available on current MR scanners and the reconstruction software used in this study is freely available.

The automatic analysis’ results provide deviations of the estimated age with regard to acceleration, which enables a direct evaluation of acceleration limits. For radiological age estimation, the determination of the minimum acceptable theoretical acquisition time requires a more elaborate analysis: The 100% assessability of the images of the hand could be expected due to existing work [27]. Assessability of the teeth was high with no obvious influence of the acceleration factor; however, it was slightly lower for the highest acceleration factor (AF = 6), which may therefore represent an acceleration limit. Correlation was particularly high for the hand; the assessment of the teeth yielded SSD values below 1 stage for both raters and all acceleration factors and MSD values showed no systematic bias. Therefore, the minimum TA,th for hand (16 s) and teeth (141 s) can be assumed as guiding values for a lower limit of applicable acquisition times.

Taking the second assessment of the original clavicle images into account, two evaluators—R2 and R3—scored comparable assessability (~ 75%) with decreased assessability for undersampled images but only a small influence of the applied acceleration factor. This could suggest that undersampling in general diminishes assessability. However, this can be explained by the fact that retrospective undersampling is a valid technical approach but the reconstructed data still includes all artefacts (motion, breathing) from the original, long acquisition. Actual undersampled acquisitions show the potential to provide increased image quality compared to long acquisitions or retrospectively undersampled data. This is shown in Fig. 2, where an open epiphyseal gap appears as partially closed in the original image. Evaluator R6 achieved a low overall assessability but also very high reliability for the assessed images. This suggests that quality standards may vary strongly for different raters.

The clavicles are reported to generally be subject to large error ranges [20]. This can also be seen in the intra-rater correlation for the original images, which lies below values reported in the literature [25]. It is known, however, that early and late stages can be confused, which has led to approaches using additional sub-stages [15, 39, 40], or combining or discarding early and late stages [29, 41]. In the current study, no guidelines for unclear cases were defined beforehand, which—in combination with the relatively small subject number—may have led to exaggerated low intra- and inter-rater agreement. Despite the difficulties involved in the analysis of this body region, an acceleration factor of 2 (TA,th = 85 s) led to high ICC values and to SSD values well below 1 stage for the clavicles. We, therefore, assume this moderate acceleration as applicable. This leads to a total minimum theoretical acquisition time of roughly 4 min for radiological evaluation. The combination of acceleration factors 4, 2 and 6 for the hand, the clavicles and the teeth, respectively (TA,th = 4:15 min) additionally represents a combination that provides reliable results for radiological age estimation and is one of the best-performing combinations of the automatic method (see Table 5). The applicability of these acceleration factors could also be shown in actual accelerated acquisitions.

It is an interesting result that the automatic age estimation method was applied to undersampled data in its original state without additional training. The method provided low overall SSD values and the combinations in Table 5 are well below the uncertainty limit of the radiological methods of 0.5–2 years [42]. The reliability is also shown in the agreement with chronological age, which changed only slightly between the age estimated from original images and the age based on images acquired with a total TA,th of 221 s. Furthermore, Table 5 confirms the moderate acceleration potential for acquisitions of the clavicles as indicated by the radiological analysis. This additionally becomes visible in Fig. 4a, where increasing undersampling of the clavicle data leads to repeating high values of ΔAge.

A limitation of this study is the relatively small sample size. This is owed to the fact that the measurements’ raw data were not stored at an earlier stage of our ongoing multi-factorial age estimation study. However, using the concept of systematically increasing the degree of undersampling, the feasibility of our approach could already be shown for the sample size used in this study. We could also see that actual fast acquisitions may provide images with increased quality compared to our original acquisitions; therefore, a study collecting real undersampled data will be considered for future work. It should, however, be noted that the total theoretical acquisition time between 4 and 5 min which was found to still allow reliable age estimation only covers the net acquisition time of the three sequences. An actual examination will require additional time for patient and coil positioning and sequence planning; however, using adequate coil combinations, a repositioning between acquisitions of the clavicles and the teeth may also be replaced by a table move. Furthermore, the automatic method used the most likely age approach for the determination of acceleration limits. We believe that the determined optimisation of the acquisition time will also be applicable to studies implementing the minimum age concept, which would require to replace the regression analysis by a classification problem; however, we did not investigate this aspect in this study.

This study was performed on a 3T system, which may not always be available. However, we expect our approach to be applicable to lower field strengths, since the feasibility of acceleration techniques could already be shown for field strengths as low as 0.3T [26]. Furthermore, the combination of CAIPIRINHA for image acquisition and TGV for image reconstruction represents a state-of-the-art approach as well as an optimised strategy: CAIPIRINHA modifies the appearance of undersampling artefacts leading to improved image quality and the TGV-based algorithm falls into the class of compressed sensing reconstruction, combining the benefits of both parallel imaging and compressed sensing.

In conclusion, we could show in this study that the total acquisition time for multi-factorial age estimation based on MR images of hand, wisdom teeth and the clavicles can theoretically be reduced to as low as four minutes while still allowing for reliable age estimation.