Introduction

Determining a person’s age is important for many legal processes, e.g., regarding child labor, sexual assault, prostitution, and sometimes for elite athletes, but is particularly relevant in the asylum procedure. In most European countries, the cut-off age of minority is 18 years of age [1, 2]. Individuals below the age of 18 are entitled to have their legal rights as minors respected in accordance with national and international laws and regulations [3]. Medical tests to assess a person’s psychological and physiological development level have been used as methods for chronological age assessment, particularly in young unaccompanied asylum seekers when their date of birth is unknown, or when they lack documents to confirm their age [4]. There is a correlation between skeletal and dental development and chronological age; hence, methods for the assessment of skeletal maturation based on radiographs of the hand and maturation of the teeth are frequently used to assign a chronological age to an individual. However, those methods have been criticized for their lack of precision [5, 6].

Magnetic resonance imaging (MRI) has been proposed as a more advanced imaging technique for the evaluation of skeletal maturation. MRI is radiation free and can be applied to several bones, e.g., the clavicle and the hand. MRI of the knee is a method which in the recent decade has been proposed to potentially provide a more accurate method for chronological age assessment than traditional radiographic methods [7,8,9].

Grading of skeletal maturation of the knee is based on the appearances of the physeal line, or physis, which is the site of growth in a long bone. The physis is a fine structure consisting of mesenchymal cells in different maturation stages. The physis becomes thinner and thinner throughout the process of skeletal maturation, before it disappears and endochondral ossification ceases. There are six different grading systems for skeletal maturation of the knee assessed by MRI: (1) Schmeling and Kellinghaus, (2) Vieth, (3) Dedouit, (4) Dedouit, Kellinghaus, and Schmeling, modified version, (5) Jopp and (6) Schmeling. All grading systems classify the maturation into exclusive stages based on the characterization and delineation of the thin physeal line (Tables 1 and 2).

Table 1 Descriptive summary of the included studies
Table 2 The stages used in respective MRI grading system that represent a “mature knee”

The aim of this systematic review is to explore the scientific evidence for medical age estimation based on skeletal maturation as assessed by MRI of the knee. We also wanted to explore the likelihood for a minor to be misclassified as an adult, or vice versa, for an adult to be misclassified as a minor, when MRI of the knee is used for chronological age estimation in a forensic setting.

Materials and methods

Protocol and registration

This systematic review was conducted at the Swedish Agency for Health Technology Assessment and Assessment of Social Services (SBU), an assignment by the Ministry of Health and Social Affairs in Sweden, and published in Swedish in October 2021 [23]. SBU uses a peer-reviewed protocol for systematic reviews. The systematic review process follows the general concepts covered by Preferred Reporting Items for Systematic Reviews and Meta-analyses, PRISMA [24].

Eligibility criteria

A study was considered eligible if it reported data for living study participants between the ages of 8 and 30 years with no pathological problems of the knee or ankle (population) for whom the ossification stages of the knee (distal femur) had been evaluated using MRI (index test). The chronological age was known through records (reference test), and the diagnostic accuracy (outcome) was reported as sensitivity/specificity or by a correlation of age and ossification stage. Only cross-sectional studies and longitudinal studies written in English, German, or any Scandinavian language were included.

Literature search

A systematic literature search was conducted by an information specialist in the following databases: Cochrane Library (Wiley), Embase (Elsevier), Medline (OvidSP), Epistemonikos, KSR Evidence, and International HTA Database. The search strategy was developed and executed in close collaboration with the co-authors LSOM (radiologist) and JD (pediatrician).

A systematic review by Ding et al [25] was used as a starting point for the literature search. Our search was thus limited to studies published between January 2017 and March 2021; all studies included in the systematic review by Ding et al were evaluated for eligibility (see section “Study selection”).

In addition, a reference and citation search of the included studies was performed in the database Scopus (Elsevier). The complete search strategy is provided in Supplement 1.

Study selection

Two reviewers independently screened the titles and abstracts identified by the literature search strategy. All studies of potential relevance according to the inclusion criteria were obtained in full text, and two reviewers independently assessed them for inclusion. In addition, the articles included in the systematic review by Ding et al [25] were screened for relevance. Any disagreement was resolved by discussion. Excluded studies are shown in Supplement 2.

Risk of bias in individual studies

Quality assessment (risk of bias) of the included studies, both from the literature search and the systematic review by Ding et al [25], was performed by two independent reviewers using a modified version of Quality Assessment of Diagnostic Accuracy Studies (QUADAS)-2 (Supplement 3). Any disagreement was resolved by discussion. Each study was rated as having low, moderate, or high risk of bias. Studies with high risk of bias were not included in the analysis.

Data collection process

Data was extracted and tabulated from each included study with low or moderate risk of bias by one reviewer. The extracted data was audited by a second reviewer. The extracted data were study design; how the index test, reference test, and outcome measured; population; and setting (Supplement 4).

Method of analysis

Data from each study was presented as a correlation between “mature knee” and age. In the studies where the number of subjects in each stage was presented for each age group, no recalculation was required [11, 14, 18, 21]. When data was presented using descriptive statistics regarding the age distribution conditional on being in a specific stage, i.e., mean age ± standard deviation (SD) for each stage, recalculation was performed. The mathematical method used for recalculation was based on a method described by Mostad et al [26] and Bleka et al [27], with the modification that we relaxed the assumption about normality and instead used the information about the exact age distribution of the subjects included in the study [23].

Rating the certainty of the evidence

The certainty in the estimated prevalence rates was supposed to be assessed using grading of recommendations assessment, development, and evaluation (GRADE) [28]. However, since the studies were too heterogenous to perform a meta-analysis, or even for a meaningful narrative review, we decided to refrain from a GRADE assessment.

Results

The literature search yielded 2529 references to be screened (see Fig. 1). Of these, 39 were reviewed in full text [10,11,12,13,14,15,16,17,18,19,20,21,22, 25, 26, 29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52]. A total of 16 articles were considered eligible since they met the inclusion and exclusion criteria. Of these, 13 had low or moderate risk of bias and are included in the analysis [10,11,12,13,14,15,16,17,18,19,20,21,22]. The risk of bias chart is shown in Supplement 5. Three articles had high risk of bias: one due to an inadequate description of the MRI method, one lacked complete information regarding birth date, and one due to several limitations. See Supplement 2 for the list of excluded studies. The characteristics of the included studies are presented in Supplement 4.

Fig. 1
figure 1

Flow chart of literature selection

Study characteristics

Out of the 13 studies, 9 were retrospective [10, 11, 13,14,15,16,17, 19, 21] and 4 prospective [12, 18, 20, 22]. The studies were from Iran [13], Turkey [10, 11, 15, 16, 21], Germany [12, 17, 20, 22], Sweden [18], France [14], and USA [19]. The study population consisted of various subjects such as patients who underwent MRI for assessment of traumatic or degenerative changes to the knee joint [10, 11, 14, 16, 17, 19, 21], healthy volunteers (4 studies) [12, 18, 20, 22], and individuals assessed due to legal reasons (1 study) [13]. The information provided regarding the study population was sparse in most papers.

The studies used 6 different grading systems for evaluation of maturation stage of the knee on MRI: Schmeling and Kellinghaus [15, 17, 20], Vieth [10, 16, 22], Dedouit [11, 14, 19, 21], Dedouit and Kellinghaus (modified version of Schmeling) [18], Jopp [12], and Schmeling [13] (Table 1). The MRI grading systems differ both in terms of the number and the definition of stages of maturation. Stages varied between 3 and 5, with up to 11 substages. The stages were described with varying degrees of detail in the original publications, and there was a mix of quantitative and qualitative criteria. Therefore, the stages in the different grading systems were overlapping and could not be pooled. The MRI protocols used varied between the studies (see Table 1).

In 8 studies, two readers had reviewed all images, but in 4 studies, only 10–21% of the images were reviewed by two readers, and in one study, the information was unclear (Table 1).

The risk for misclassification

The studies were deemed too heterogeneous with regard to MRI protocols, grading system, and population; therefore, a meta-analysis was not performed (Table 1 and Supplement 4). Even in the studies using the same grading system, the population and MRI protocol were considered too heterogeneous to allow a meta-analysis. We therefore present the results from each study individually. The results are presented separately for males and females. We present the results for the most relevant classification stages for age determination, where individuals both over and under 18 years are represented (Table 2).

Of the 13 studies, 4 presented data as stage per age [11, 14, 18, 21] and no recalculations had to be performed. For the remaining nine studies [10, 12, 13, 15,16,17, 19, 20, 22], recalculation was performed for six of them [10, 15,16,17, 20, 22]. Two studies could not be included in the analysis since they lacked information about minimum and maximum age which is required for the recalculations [13, 19], and one study had only sufficient information for 6 participants [12].

In Fig. 2, the proportion of males or females with a mature knee per age (interval 15–23 years) is presented. The curves are estimated by using a logistic regression function, \((p(x)=1/1+e^{-(\beta_0+\beta_1X)}\), with age as the independent variable and mature/not-mature as the dependent variable. The model estimates the probability of having a mature knee as a function of age.

Fig. 2
figure 2

The proportion of males or females (%) for the respective MRI grading system with a “mature” knee

The data show clearly how the results differ between the studies. For example, the maturation of the knee for males seems to appear later in the studies by Kramer et al [17] and Dedouit et al [14] than in the studies by Vieth and Ottow et al [20, 22].

We calculated the risk for misclassification, i.e., the risk that a minor (an individual 17 years or younger) would be misclassified as an adult (an individual 18 years or older) or that an adult would be misclassified as a minor, in the age interval of 15 to 21 years for each of the studies. Figure 3 illustrates the results based on data from the study by Ottow et al [20]. In Table 3, the results for all grading systems and all studies are presented. The risk of misclassification varies between studies (Table 3). The risk for misclassification is generally much higher in females than in males until the age of 17, at which point males are more likely to be misclassified. For females, where the knee matures earlier, the highest risk for misclassification is between 16 and 18 years of age. For males, the risk is the highest between the ages of 17 and 18 years.

Fig. 3
figure 3

Correct classification and misclassification per age (15–21 years of age) subdivided for males and females (%) based on the study by Ottow et al

Table 3 The risk for males and females to be misclassified, presented per age group, study and grading system

Positive predictive value, negative predictive value, sensitivity, and specificity

To be able to calculate positive predictive value (PPV), negative predictive value (NPV), specificity, and sensitivity, the frequency/ratio of mature and immature knee per age, as well as the age distribution of the population tested, needs to be known. The risk of misclassification depends on the actual age distribution of those tested. For example, if many individuals are tested in the age range 17 to 18 years, the risk of misclassification will be high. If the age range is widened, the proportion of errors will be lower.

To illustrate this issue, we have performed calculations on five hypothetical populations, each with a different age distribution (Table 4).

Table 4 Number of individuals per age in five different hypothetical populations

Using data from the study by Ottow et al (male) [20], we illustrate how the same datapoints give rise to different predictive values in response to variation in the age distribution of the population tested (Fig. 4). This is further elaborated in Supplement 6 where predictive values using datapoints from other included studies on the same five hypothetical populations are presented. The calculations show that NPV, PPV, sensitivity, and specificity for MRI of the knee as a test to determine whether a person is over or under 18 years old cannot be calculated in individuals between 15 and 21 years of age unless the exact age composition of the group is known.

Fig. 4
figure 4

Illustration of how PPV and NPV vary depending on the age composition of the populations tested. Based on data from Ottow et al (male)

Discussion

This systematic review shows that the proportion of individuals per age group deemed to have a mature knee on MRI varies considerably between the included studies. Women’s knees seem to mature earlier than men’s knees; however, the magnitude of the difference was not consistent between studies. The calculated risk for misclassification for each study also varied because it depends on the dataset from which the calculations are based. The variation in results of maturation according to age can be explained by the different grading systems, MRI protocols, and study populations in the published studies.

Further, we illustrate how calculating the risk of misclassification for the purpose of forensic age determination is paradoxical, as that risk can only be calculated in a population with a known age distribution. This fundamental statistical limitation cannot be overcome, no matter how advanced the measurement techniques are, or how well future studies are conducted.

Maturation of the knee is influenced by individual factors such as genetics, medical conditions, and nutrition [53]. This causes a wide, and somewhat unpredictable, variation in skeletal maturation rates. Since these factors naturally vary between individuals, the correlation between chronological age and bone maturation is influenced by the characteristics of the study population.

MRI is probably the modality within the field of radiology which is the most difficult to standardize, both in terms of image acquisition and image reading [54, 55]. Therefore, results from studies performed in different institutions, on different MRI machines, with different readers, different image scan parameters, and with different scoring systems, are not directly comparable, and to do so will result in substantial variability. For instance, Kvist et al [18] demonstrated that the use of different weightings applied on the same individuals influenced the grading of physeal maturation. Objective assessment of MRI signal is difficult because readers tend to perceive the same image intensity differently, depending on surrounding background intensities [56]. Reader experience and calibration of the reading structure also influence the interpretation of the images [18].

One strength of this systematic review is the strict adherence to international standards for systematic reviews. Studies were identified and selected according to the PRISMA statement which is internationally regarded as state of the art for performing and reporting systematic reviews (see Supplements 15). Another strength is our multidisciplinary approach where information specialists and statisticians worked in close collaboration with clinical experts, including pediatricians and a pediatric radiologist.

The main limitation of this systematic review is that relatively few studies were identified. Larger and more comparable studies may be able to show a stronger trend in terms of the proportion of individuals per age group deemed to have a mature knee on MRI. Still, we will not be able to able to predict the risk of misclassification in individuals with an unknown age due to the statistical paradox described above.

Forensic age estimations can have huge legal implications for the individual being evaluated, as well as for countries and authorities who use such methods. Therefore, the methods used should be reliable and reproducible and the statistical calculations of probability must exactly fit the population to which it is applied.

In conclusion, there is a considerable heterogeneity in the published studies on forensic age assessment based on MRI of the knee. Therefore, neither a meta-analysis nor a meaningful narrative review could be performed. Furthermore, the actual risk of misclassifying a minor as an adult and vice versa can never be calculated in a group of individuals with unknown age distribution.