A systematic overview of dental methods for age assessment in living individuals: from traditional to artificial intelligence-based approaches

Dental radiographies have been used for many decades for estimating the chronological age, with a view to forensic identification, migration flow control, or assessment of dental development, among others. This study aims to analyse the current application of chronological age estimation methods from dental X-ray images in the last 6 years, involving a search for works in the Scopus and PubMed databases. Exclusion criteria were applied to discard off-topic studies and experiments which are not compliant with a minimum quality standard. The studies were grouped according to the applied methodology, the estimation target, and the age cohort used to evaluate the estimation performance. A set of performance metrics was used to ensure good comparability between the different proposed methodologies. A total of 613 unique studies were retrieved, of which 286 were selected according to the inclusion criteria. Notable tendencies to overestimation and underestimation were observed in some manual approaches for numeric age estimation, being especially notable in the case of Demirjian (overestimation) and Cameriere (underestimation). On the other hand, the automatic approaches based on deep learning techniques are scarcer, with only 17 studies published in this regard, but they showed a more balanced behaviour, with no tendency to overestimation or underestimation. From the analysis of the results, it can be concluded that traditional methods have been evaluated in a wide variety of population samples, ensuring good applicability in different ethnicities. On the other hand, fully automated methods were a turning point in terms of performance, cost, and adaptability to new populations.


Introduction
Chronological age is, together with biological sex and ethnicity, the most important human feature to be considered in anthropological and forensic studies [1]. Besides, chronological age estimation is used daily in legal procedures where the birthdate of the involved subjects can not be verified due to either the absence of birth certification or the suspicion of false documentation. This applies to migration controls or trials involving undocumented people since the attainment of legal age has many implications according to the laws of most countries. It is also important in the Inmaculada Tomás inmaculada.tomas@usc.es María J. Carreira mariajose.carreira@usc.es Extended author information available on the last page of the article. adoption processes of undocumented children. In all these cases, an expert performs a somatic maturity examination.
The development status of bones has been used successfully to estimate chronological age. In this regard, many skeletal parts have been used, such as pubic symphysis, auricular surface, or sternal ribs [2]. Also, it is worth noting that there is not a single method based on bone development that outperforms others systematically, as the performance of each one depends on numerous factors. For instance, there are specific age estimation methods developed for subadults and others that work better in adults [3].
One of the most widely used body part in the field of age estimation is the teeth, mainly because dental mineralisation has been reported to be less affected by external factors (e.g. genetics or environment) than bone mineralisation [4]. In this regard, dental imaging techniques represented a step forward because they allowed clinicians to assess bone development with less invasive and faster procedures, and thus enabled them to perform chronological age estimation.
The estimation of age from dental radiographic records is based on the evaluation of some characteristics such as the formation of jaw bones; the appearance of tooth germs, the degree of crown completion and its eruption, the degree of resorption of deciduous teeth; the measurement of open apices in teeth; the volume of the pulp chamber and root canals; the formation of physiological secondary dentin; the toot-to-pulp ratio; or the development and topography of the third molar [5].
It is worth noting that the panoramic X-rays (ortopantomographies or OPGs) provide the least invasive radiologic technique to estimate age, as it only requires a single image to capture the whole dentition. Besides, other bone structures can be seen, such as the mandible, the nasal fossa, or the vertebrae, which are also helpful for further examinations. In the following, a review of the main methods to estimate the age of dental radiographs has been carried out.

Material and methods
For the review purpose, a conducting protocol approved by an expert reviewer and compliant with the PRISMA guidelines for systematic reviews [6] has been established. Scopus and PubMed databases have been used to retrieve a collection of full-text studies on age estimation from dental radiographies published in the last 6 years (from 2016 to 2022). This specific period was chosen for two main reasons. First, the number of published studies is sufficiently high to report significant conclusions. Second, automatic methodologies in the field of dental age estimation have been mainly used in this period, and not before, and therefore including earlier years would have diluted their relevance in this review. Then, a study selection process has been carried out, as seen in Fig. 1.
The query used in Scopus was: TITLE-ABS-KEY ( ( "age estimation" OR "age assessment" OR "age regression" OR "age determination" ) AND ( dental OR tooth OR teeth OR mandib* OR incisor OR canine OR premolar OR molar ) AND ( x-ray OR radiolog* OR radiograph* OR opg OR orthopantomograph* OR panoramic OR ct OR cbct OR mri ) ) AND The query used in PubMed was: ( As it can be seen, the query is not strictly the same, as Scopus allowed also for excluding certain unwanted subject areas, such as Veterinary or Arts. As a result, a set of 537 studies were collected from Scopus and 336 from Pubmed on February 24th, 2022, which in the end represented a body of 613 unique works. The abstract of each work was reviewed to discard unwanted studies, according to the following exclusion criteria: (1) studies not aimed at chronological age estimation in humans; (2) non-radiological studies; (3) studies that use non-human samples; (4) studies relying on a sample smaller than 50 subjects or studies that do not report the sample size; (5) studies whose full text is not available.
Regarding the collection of studies aimed at evaluating the age estimation methods proposed in the literature, only those reporting at least one of the following metrics were evaluated. In terms of numerical age estimation studies, a statistic on the residual error (dental age minus chronological age) and the absolute error-mean, median, or standard deviation-, the standard error of the estimates, and/or the coefficient of determination R 2 . Methods geared toward age classification were required to report the accuracy, sensitivity, and/or specificity of the classification results. Although dental development is less affected by genetic or environmental factors than other bones, this process is still subject to variations, and so the age estimation methods were usually assessed in different populations and/or ethnic groups all over the world.
To reduce as much as possible the risk of bias in this work when comparing the results obtained by different methods, the collected studies were analysed to detect evidence of malpractice. As a result, five studies were discarded due to the non-compliance with basic aspects such as good wording or a comprehensive description of the experiments, as this could also indicate a problem in the peer review process. It is worth noting that only the most flagrant cases were taken into account to minimise the bias that the observer could introduce in this evaluation process. In the end, a set of 286 studies was selected for further analysis.

Tooth-based manual methods
The studies retrieved in this work relied on a wide variety of age estimation methodologies. However, as dental formation is highly correlated with chronological age and, therefore, is a key indicator for age estimation, most methods are based on dentition analysis. In this regard, the first approaches were purely manual, that is, they required experts not only to retrieve the correspondent information from the X-ray image but also to translate this information into an age value. These approaches  are shown in Fig. 2.

Children and young adults
Age estimation via dentition analysis has reportedly led to better results when dealing with newborns to subjects aged 22 to 25, that is, during tooth development. This makes studies aimed at estimating the age of children and/or adolescents to be more numerous than those focusing on age estimation in adults. Regarding the former, some methods aim to assess specific development milestones (such as dental eruption) to predict age [7,40]. Though, they have proven to lead to very limited estimates, as they rely on very quick changes, from which little information can be collected.
Other methods aimed to assess the development of the teeth over a longer period. That is the case for dental Atlases, which are graphic representations of dental development and eruption that provide an easy way to estimate chronological age by comparing the status of the dentition using radiological or osteological techniques to the charts provided in the Atlas [8][9][10][11]. Other authors went a Fig. 2 Main manual methods for estimating chronological age from dental X-ray images step further and developed dental scoring systems (DSS), consisting of dividing the development period of each tooth into a set of developmental stages with associated scores and using those scores to estimate the numerical chronological age. In this regard, the number of stages varied depending on the specific system. For example, Gleiser and Hunt [18] proposed 15 stages, Nolla [12] developed a division into 11 stages, Demirjian et al. [13] reported eight alphabetical stages, and Liliequist and Lundberg [21] proposed the use of seven stages, in a clear attempt to reduce the complexity of the method. Furthermore, some authors developed population-specific scoring tables on top of the Demirjian et al. [14,16,17] and Gleiser and Hunt's systems [19,20], while others mixed several staging systems to improve the overall estimation performance [22].
Cameriere et al. [23] introduced a different method for estimating age, based on tooth measurements. Specifically, the authors measured the open apices of the seven left permanent mandibular teeth. These measurements, previously normalised by tooth height, were highly and negatively correlated with chronological age. Furthermore, the number of teeth with completely closed root apices was reportedly correlated with age. These findings led the authors to develop a regression formula that depends on the sex of the subject and the normalised measurements of the seven teeth and the number of teeth whose root development is completed.

Adults
Although the development of teeth ends once the third molar is completely developed, some authors focused on other age-related changes that are radiologically observable to estimate age in older subjects. In this regard, three different families of methods can be identified. On the one hand, some authors explored the use of specific measurements or ratios between them to perform age estimations. For example, Kvaal et al. [26] proposed to measure dentin apposition indirectly through the assessment of the dental pulp radiopacity. The researchers carried out several linear measurements of both the pulp and the tooth and associated those measurements via linear ratios. Cameriere et al. [30] proposed a similar idea, but they replaced the linear measurements with area assessments. Another similar example is the Tooth Coronal Index (TCI), studied by Ikeda et al. [27], which represents a height ratio between the crown and the pulp cavity at the crown level.
On the other hand, a set of studies focused on the visibility of some structures established staging systems with which that visibility could be assessed. The structures most studied in this regard were the periodontal ligament and the root pulp, with the staging systems proposed in this regard by Olze et al. [31,33] standing out.
Finally, some authors reported that a series of degenerative changes can be assessed through a staging system and therefore be used to estimate chronological age. In this regard, Gustafson [35] set multiple evaluable criteria, namely secondary dentin formation, periodontal recession, attrition, apical translucency, cementum apposition, and external root resorption. The degenerative stages proposed in the original work, which were intended to be applied to extracted and ground teeth, proved to be applicable to radiographic images as well, as confirmed with the methodologies proposed by other authors, such as Olze et al. [36] and Timme et al. [37].

Non-numeric age estimation
Besides age estimation methods developed for obtaining a numeric and continuous output, other authors focused on designing classification methods to estimate the probability that a subject belongs to a specific age group. Most of these studies relied on conventional numerical age estimation methods and adapted them to be used as age group classifiers. This is the case, for example, of the study of Sehrawat and Singh [41], which used the Kvaal et al.'s method [26] to perform a classification into four groups.
However, the majority of these studies are focused on a binary classification with two groups of subjects younger and older than a given threshold, which can be the legal age of maturity or any other specific age with high relevance in legal procedures. In this regard, Mincer et al.
[42] relied on the staging system proposed by Demirjian et al. [13] to assess the development of the third molar, with the objective of estimating the probability of being older than a certain age for each stage.

Age estimation on other radiologically observable structures
Although most age estimation methods from dental radiologic records are based on the analysis of the teeth, there are other structures whose characteristics may also be useful for age estimation. In the period covered by this systematic review (from 2016 to 2022), the number of works is very limited and all of them rely on mandibular measurements. Some examples are the approach followed by Motawei et al. [38], who established a relationship between the length of the ramus and chronological age, or the proposal by Acharya [39], in which the gonial angle was used as the main age indicator.

Automatic methods
Recent advances in image processing have allowed for automating dental age estimation methods to a greater degree and have led to the development of numerous methodologies. In this regard, the authors explored the same objectives as those covered in the traditional methods, as can be seen in  Regarding the applied methodologies, one of the first attempts to rely on image processing techniques was made byČular et al. [46], where the authors proposed the use of an Active Appearance Model to localise the third molar and parameterise its shape and texture. In a second step, these parameters are introduced into a neural network to estimate the chronological age. As both steps do not need human intervention, the method works automatically.
As in most of the topics involving image processing, deep neural networks (DNNs) helped not only to automatise some tasks but also to improve their performance. Regarding age estimation, De Tobel et al.
[56] developed a staging system for the third molar based on modified Demirjian stages and used a DNN to classify the third molar image crops into one of those stages. This method only required a minimum intervention of the expert to crop the region of interest to frame the third molar area. This approach was updated by Merdietio et al.
[57] by replacing the manual crop step with a DenseNet network, which allows estimation to run automatically. Banar et al.
[58] developed a similar method, with a slightly more complex third molar segmentation, in which the tooth is first localised and then segmented.
Kim et al.
[59] followed a similar approach to that of De Tobel et al. [56]. The authors also developed a twostep approach which firstly requires a manual crop of the third molar, although in this case the four third molars are required. In the second step, each of the four teeth is classified into different age groups, and the classifications are merged through a majority voting system. The authors established two different age group divisions: the first grouped subjects younger than 20, subjects aged 20 to 49, and those over 50; the second split the middle group into three subgroups, namely subjects aged 20 to 29, subjects aged 30 to 39, and those aged 40 to 49.
Although deep learning methods had already been introduced in the studies mentioned above, De Back et al.
[48] proposed the use of a DNN, specifically a Bayesian Convolutional Neural Network, as the only step to estimate chronological age. Therefore, the expert does not need to specify which features of the image should be taken into account, as the network focuses automatically on those regions which contributes the most to the age estimation. Furthermore, the age estimation process can proceed even if several teeth are missing.
Vila-Blanco et al. [49], following the clinical finding that dental development is different in boys and girls, developed a method to automatically integrate sexual information into the age estimation process. Thus, they proposed the use of two identical CNNs, one for age estimation and the other for sex classification, so that the sex CNN

Evaluation studies
The studies retrieved in this work were categorised according to the age estimation methods they relied on. As it can be seen in Fig. 3a, where the ten most used methods are represented, Demirjian et al.'s approach [13] has been applied in more than a third of the studies (100 out of 286), with some methods derived for it also in the first positions-Willems et al.'s [14] and Chaillet and Willems' [16] were applied in 45 and 7 studies, respectively. The first method not aimed originally at estimating the numerical age is the approach proposed by Cameriere et al. [62]. This method, focused on the classification of subjects younger or older than legal age, was used in 43 studies. On the other hand, it is noticeable that 239 studies relied on OPGs to conduct the experiments, representing 84% of all the retrieved works ( Fig. 3b). The rest of the studies used CT-based techniques (such as CBCT or conventional CT) and, to a lesser extent, intraoral images, MRI, or the cephalometric view.
Regarding the performance of the age estimation methods, a maximum of one study was evaluated for each population and each method, specifically that evaluated in the largest sample due to the greater significance of the reported results. This ensured a good representation of different ethnicities while avoiding overcrowded result tables. Following the same order as in the previous section, the approach based on tooth eruption assessment proposed by Haavikko [7] was evaluated in a wide range of populations since its development, but it clearly lost popularity in comparison to other methods. As shown in Table 2, only four studies that met the inclusion criteria have been analysed, all focused on subjects younger than 16. In terms of performance, these studies reported systematic underestimations given by residual errors (difference between estimated age and real age) with means ranging from −0.22 to −1.35 years and standard deviations around one year. The mean absolute errors yielded mean values of 0.33 to 1.45 years.
The atlas-based methods listed in Fig. 2 were also applied in the last few years, although only the London Atlas proposed by AlQahtani et al. [10] was tested in more than one population sample. As shown in Table 3, the work by Baylis and Bassed [63], which compared the three Atlasbased methods in a New Zealander population, reported a slight underestimation with the Schour and Massler Atlas [8] (−0.03 to −0.39 years of mean error), a slight overestimation with the Blenkin and Taylor method [11] (+0.07 to +0.34 years), and a noticeable overestimation Regarding the staging methods, the one proposed by Nolla [12] was used in eight different studies, as seen in Table 4, showing mean residual errors between −1.12 and +0.54 years, and standard deviation values between 0.23 and 3.30. The mean absolute errors ranged from 0.66 to 1.10 years. Most of the studies were focused on subjects between five and 15 years of age, although Berkvens et al.
[64] conducted a study on subjects aged up to 30.
The method developed by Demirjian et al. [13] is perhaps one of the most studied approaches for the estimation of dental age. In the analysed period of time, a set of 40 studies using Demirjian et al.'s method [13] reported any of the required metrics in different populations, as shown in Table 5. The range of ages was also wider than in the case of Nolla's method [12], with subjects ranging from two to 30, although most of them focused on the interval between five and 23. Regarding the obtained results, a clear overestimation can be seen, being the mean errors between −0.58 and +2.13 years. Absolute errors indicate that the error magnitude lies between 0.13 and 1.48 years, while the reported R 2 values were over 0.60 in any case.
The modified Demirjian's method developed by Willems et al. [14] led to numerous studies focused on testing its applicability in different populations. In this review, a set of 28 studies was analysed (Table 6). On average, the method Table 2 Evaluation of the Haavikko's method [7] Evaluation work Hedge et al. [  was applied to a narrower age range, working most of the authors in the range between five and 16 years of age. Although more investigations that show overestimation than underestimation-13 vs. 12, respectively-, this trend is much less noticeable than in the case of the method by Demirjian et al. [13]. The absolute errors also tended to decrease with this method, as the values lied between 0.61 and 1.16 years.
Bedek et al. [15] proposed a modification of Willems et al.'s method [14], which was evaluated in an Indian population by Sheriff et al.
[65], as it can be seen in Table 7. The results showed a notable underestimation (up to −0.55 years of mean error), but the low standard deviation values (0.05 to 0.06 years) indicated that the error was consistent between all subjects. The modification of the Demirjian et al.'s method [13] proposed by Chaillet and Willems [16] was applied to four different samples in the collected studies, as presented in Table 8. The range of application, however, is narrower than in the Demirjian's applications, as the subjects were in every case younger than 18. Unlike the systematic overestimation of the Demirjian et al.'s method [13], Chaillet and Willems' [16] tended to underestimate age, with mean errors between −2.79 and −0.07 years. Absolute errors ranged from 0.66 to 1.14 years on average, with standard deviations between 0.49 and 0.52 years.
Finally, the modified Demirjian's method developed by Blenkin and Evans [17] was applied to two different populations of subjects aged six to 17 (Table 9), yielding errors with mean values ranging from − 0.05 to − 0.55 years and standard deviations up to 1.04 years. The absolute errors ranged from 0.61 to 0.91 years on average.
The tooth staging criteria proposed by Gleiser and Hunt [18] led to the development of several methods, such as those proposed by Moorrees et al. [19] and Kohler et al.   [20]. Six studies applied the former in different samples of subjects aged from three to 30, as seen in Table 10, with mean errors between −1.01 and +0.34 and so a tendency to underestimating the age. In absolute terms, the error ranged between 0.63 and 1.42 years. On the other hand, Kohler et al.'s method [20] was applied to two different samples  [21] was used in two of the retrieved studies, in Brazilian and Croatian populations, respectively. As it can be seen in Table 11, the method led to an age underestimation in both cases, though it was more noticeable in the former, with a mean error of −0.58 years. Absolute errors were very similar in both studies, with mean values of 0.97 and 0.99 years and median values of 0.83 and 0.81, respectively.
The method proposed by De Tobel et al. [22], which mixed both Demirjian et al. [13] and Kohler et al.'s [20] [66], the mean error is negative while the median error of male subjects is positive. The absolute errors ranged between 0.57 and 1.60 years on average.
The proposed approaches for estimating chronological age in adults produced systematically worse results than their children-orientated counterparts, and the available studies are much scarcer. Moreover, the studies that apply adult-based methods tended to report mostly the standard error and R 2 values instead of the residual and absolute error measurements, in opposition to the previously presented methods. In this regard, 35 studies were collected related to the evaluation of metric methods based on a set of linear and volumetric tooth analysis, as seen in Table 14. The most common approach is the pulp-to-tooth linear, area, and volumetric ratio (PTLR, PTAR, PRVR), used in eight works each, and the tooth-to-crown index (TCI) and the pulp-to-crown volume ratio (PCVR), each one applied in three studies. It is also worth noting that most of these works relied on 3D images (such as CT-based records) instead of flat X-rays, as they allow the volume of the different tooth structures to be analysed accurately.
Huge variability in the results reported by these studies is observed. For example, the mean absolute error varied not only depending on the measurement but also across the studies using the same measurement (from 5.66 to 25.85 years in the case of PTVR). This can also be seen in the standard error metric, which lied between 4.66 and 15.29 years. In terms of variance explained, the models moved between 1 and 97%. The specific method of Kvaal et al. [26], which is based on pulp-to-tooth linear ratios, yielded very different behaviour in the available studies. Methods that have associated radiographic visibility of several oral structures with chronological age usually do not aim to estimate a numerical age value, so only two of them reported estimation error metrics. As shown in Table 15, the study of Chaudhary and Liversidge [71] pointed out an overall overestimate of 7.21 years for males and 6.87 years for females, being the mean absolute error of 7.91 and 7.74 years in the same two scenarios. On the other hand, Timme et al. [72] did not report the error metrics, but a standard error of 3.55 years and the percentage of explained variance (69%).
Methods based on the evaluation of degenerative tooth changes, based on Gustafson's criteria [35], are summarised  years in the first method and 11.08 years in the second). Finally, R 2 values were in the range 0.23-0.80. As mentioned in "Age estimation on other radiologically observable structures" section, the estimation of chronological age was also approached by mandibular bone analysis, specifically by measuring ramus length [38] and gonial angle [39]. The former produced a model that represented 62% of the data variance, while the latter led to an absolute error of 13.98 years. As it can be seen in Table 17, both studies reported different metrics, so they are not directly comparable.
The most widely used methods for numeric age estimation were jointly analysed regarding the obtained underestimation or overestimation. As it can be seen in Fig. 4, two of the six methods showed a clear pattern of overestimation, namely those proposed by Demirjian et al. [13] and the London Atlas [10]. On the other hand, the methods developed by Cameriere et al. [23] and Nolla [12] led to a systematic underestimation of age. Finally, the methods based on linear and volumetric measurements of the teeth, as well as that proposed by Willems and Chaillet [14] yielded a more balanced performance, with almost the same number of studies underestimating and overestimating age.
As mentioned in the previous section, some age estimation methods were adapted to work as a binary classifier for detecting people younger or older than the legal age. The results obtained in this regard are presented in Table 18. First, the methods based on tooth eruption presented by Haavikko [7] and Olze et al. [40] were assessed in the problem of 14-year-old detection. The former led to accuracy between 78 and 81%, while the latter yielded better performance, with 83 to 86%. Also, the method proposed by Olze et al. showed a more balanced behaviour, with similar sensitivity and specificity values.
There is only one study that evaluated an Atlas-based method for binary age classification. Specifically, De Moraes et al. [73] used the London Atlas [10] for classifying dental records according to the 18-year-old threshold. Although the accuracy reached a reasonable value of 80%, the methods were heavily biased, as they produced a very high sensitivity-that is, they correctly detected subjects older than 18-but very low specificity -it only correctly classified half of the subjects younger than 18.
Dental staging methods were used to a greater extent. Regarding the Nolla method [12], it was applied to the Portuguese and Montenegrin populations. In the first, the method obtained accuracies from 82 to 90%, depending on the age threshold, while in the latter the accuracy was 90% for males and 87% for females. It is noticeable that   [14] in an Italian population, reaching a slightly worse result than in the case of the original scores, especially in the sensitivity values (74-78% vs. 80-82%).
Gleiser and Hunt staging system [18] was also studied on the problem of binary age classification via the derived methods of Moorrees et al. [19] and Kohler et al. [20]. The former was applied in a Portuguese sample with 14, 16, 18, and 21 thresholds, obtaining accuracies from 83 to 90%. Again, the sensitivity and specificity values were highly unbalanced, especially when using the 14-year-old threshold (92% of sensitivity and 59% of specificity). On the other hand, the Kohler et al.'s method reached an accuracy of 91% in an Indian population and sensitivity and specificity values of 87-80% and 87-90%, respectively, in a Russian sample.
The adaptation of Cameriere's method for legal age classification [62] was by far the most used method for binary age classification. Among all the experiments carried out with this approach, 29 out of the 40 established an age threshold of 18 years. The accuracy values ranged from 72 to 98%, although 35 studies yielded values greater than 80%. As with most methods, there are some cases where a great bias between sensitivity and specificity can   be seen, the most significant example being the study by AlQahtani et al. [76], where the sensitivity was 51-52% and the specificity was 100-97%.
Finally, Olze et al.'s [31] method based on the assessment of the root pulp visibility was evaluated in Turkish and Indian samples. Only the latter study reported the accuracy-77% in males and 80% in females-yielding also a notable imbalance between sensitivity and specificity.
The two most widely applied methods for age thresholding, namely those proposed by Mincer et al. [42] and Cameriere et al. [62], were compared using a reference value of 90% accuracy. As it is shown in Fig. 5, the method of Mincer et al. obtained a performance better than the reference when establishing an age threshold of 12 years and three times out of four with a threshold of 18 years. When the threshold is set to 14 years of age, one study obtained better performance than the reference, and another work reported worse performance. Regarding the Cameriere et al's method, all studies that set an age threshold of 12, 14, or 16 years reported accuracy values lower than the reference, while studies applying a threshold of 18 years showed a more balanced performance, with 12 studies performing better than the reference and 16 works reporting worse results.
Regarding the automatic approaches proposed for age estimation, each method was tested in a single population. In those aimed at estimating a numerical age value (Table 19), the residual error was systematically closer to zero, being the median between −0.07 and +0.12 years. Absolute error varied depending on the age range of the assessed sample. As reported by Vila-Blanco et al. [49], the mean absolute error in a Spanish sample ranged from 0. 75  , which was tested in a large population sample and very wide in terms of subject age (from 0 to 93 years of age), led to a very low mean absolute error, being of 1.64 years. It is also noticeable that the method of Vila-Blanco et al. [54], which relies only on the mandible shape instead of the whole dental image, yielded a mean absolute error of 1.57 years, which is comparable or even better than other methods relying on the whole dental image. [61] tested their automatic age thresholding approach in Austrian and Chinese populations, respectively, as it can be seen in Table 21. It is worth noting that the latter sample consisted of more than 10,000 OPGs. Guo's method led to accuracy values between 93 and 96%, depending on the specific age threshold, and was very consistent in terms of sensitivity and specificity. On the other hand, the method byŠtern et al. reached a slightly worse accuracy value in the same scenario (85% vs. 93%), and a notable imbalance between sensitivity and specificity is observed.
The applicability of the methods used in the studies included in this work was assessed in terms of the age of the subjects. As Fig. 6 indicates, there are notable differences among the proposed approaches. While the methods developed by AlQahtani et al. [10], Nolla [12], Willems and Chaillet [14], and Cameriere et al. [23] focused on a very constrained group of patients aged two to 18, approximately, Demirjian et al.'s method [13] could be applied to a wider group of patients of even more than 25 years of age. The methods focused on post-developmental dental features, such as the one proposed by Kvaal et al. [26] or the pulp-to-tooth volumetric ratios, have as their natural field of application those subjects aged 18 to 70, approximately. On the other hand, the automatic methods have been proven to be applicable in a wider age range, covering both the subjects with developing dentitions and the subjects with fully developed teeth.

Discussion
The oral cavity, and especially the teeth, has been used for decades as they show a high correlation with development patterns. In this regard, great efforts have been made since the late nineteenth century to develop teeth evolution Atlases, with a view not only to the formation of teeth concerning age but also to the sexual dimorphic patterns of that development. The democratisation of radiology led to a number of improvements, one of them being the collection of bigger databases to use in new studies, which in the end increases the statistical significance of the findings. Another major revolution came with the arrival of computers, which  brought the possibility of acquiring X-ray images directly in a digital format and, therefore, to speed up the measuring processes.
With the aim of using the structures that are observable in the dental images to estimate the chronological age, different approaches have been followed. Only three studies collected in this work focused on the mandible bone [38, 39, 54], while the rest relied on tooth analysis methods. Among the latter, two groups of works can be noticed, namely those aimed at estimating age in children and young adults [12,13,23] and those focused on adults [26,31,36]. Moreover, some authors approached the age estimation problem as a classification task instead of a regression problem. In this regard, some studies tried to classify the age of the subjects into two groups, usually with the main purpose of estimating the chances a person has to be older than the legal age. Other works generalised this idea and applied an age classifier with more than two target classes. The usual way to classify the age in both cases was through some modification of a numeric age estimation method [42][43][44].
As tooth and bone development is reported to depend on factors such as ethnicity or environment, these methods have been evaluated to a greater or lesser extent in different populations around the world. To assess the differences between the different approaches and the population samples used, this work retrieved a corpus of studies applying age estimation methods in dental images published in the last 7 years (from 2016 to 2022). A total of 613 unique studies were obtained, of which 286 were selected after applying the inclusion criteria. The first point to highlight is the great difference in the number of studies that apply each method. For example, methods such as those proposed by Schour and Massler [8] or Blenkin and Taylor [11] were applied in one single work in the evaluated period, which decreases the significance of the results to a large extent.         Regarding the performance obtained by the analysed methods, it has been proved that the eruption-based methods are useful for a very short period (until 15 years of age, approximately), and the only method included in this work, proposed by Haavikko et al. [7], led to a systematic underestimation of the chronological age in every tested population. Similarly, Atlas-based methods assessed have also been applied to very constrained population samples in terms of age, being the London Atlas method [10] the only method evaluated in multiple populations.
Dental staging approaches have proven to be highly relevant in this period, as they accounted for nine of the ten most commonly used methods in this work. This allowed us to compare these methods to each other accurately and confirm the findings reported in previous works, such as the tendency of Nolla [12] and Demirjian et al.'s [13] methods to underestimate and overestimate the age, respectively [77], or the balanced behaviour yielded by the Willems modification of Demirjian's method [14]. The methods based on Gleiser and Hunt stages [18] achieved slightly worse results than Nolla's [12] [20] approaches showed a significant underestimation and greater absolute errors. It should also be noted that the method proposed by De Tobel et al. [22], which combines multiple dental staging methods, showed a behaviour with no tendency to underestimation or overestimation, but greater absolute errors than those obtained with the methods it relies on individually. On the other hand, Cameriere's method [23] based on the measurement of open apices systematically underestimated the age, while the absolute errors were highly inconsistent compared to the previously mentioned methodologies.
Estimating chronological age in adults has proven to be a much more difficult task, as confirmed in the studies retrieved in this work. None of the studies reported absolute errors lower than 2 years, and trends of underor overestimation were also more pronounced [69,78]. Regarding the methods using linear or volumetric dental measurements, there is a clear improvement in terms of standard error values when volumetric information is taken into account. It is also noticeable that adult-oriented methods do not report as many performance metrics as those targeted at children, which hinders the assessment of their performance. The most problematic cases are those involving the radiographic visibility methods and the mandible-based methods, where a direct comparison is not possible.
The methods proposed for detecting if someone is younger or older than a predefined threshold (usually Fig. 6 Application of the most widely used methods for age estimation regarding the age of the subjects included in the analysed studies. Only those methods applied in at least ten studies were included Willems et al. [14] Cameriere et al. [23] Automatic Cameriere et al. [62] London Atlas [10] PTVR Nolla [12] Kvaal et al. [26] the legal age) provided a more suitable environment for comparability purposes, as most of them reported the accuracy, sensitivity, and specificity metrics. The majority of studies yielded reasonably high accuracy, with values over 80 or even 90%, suggesting that these methods could be used either alone or combined with others to produce confident estimations. However, it is worth noting that the results are very dependent on the age cohort used to test the methodology. If subjects are not well-balanced around the threshold, that is, if most subjects are younger or older than the legal age, the performance will be abnormally good, as the classifier will tend to assign the majority class to all subjects [79]. A similar problem can occur if the testing age cohort is too wide, as the further away the subjects are from the age threshold, the easier it is to classify them and, therefore, the better the classification performance.
The inclusion of modern image processing techniques for age estimation led to a set of improvements, being the first one the time and cost saving by means of full or almost-full automation. Moreover, the performance achieved by these methods is clearly higher than that obtained with the manual methodologies. Although in some cases the errors were high due to the exclusion of children from the population sample [51, 53, 55], the overall performance was remarkable, with absolute errors lower than one year for patients younger than 20 [49] or 1.64 years in a huge sample of almost 28,000 images from patients aged zero to 93 years [50]. In this regard, the applicability of these methods is better compared to that of the classic manual approaches, with an almost avoidance of underestimation and overestimation problems and the subjectivity intrinsic to human-guided processes, as well as the possibility to run estimations in a wider age cohort.
The studies that applied automatic methodologies to perform binary age classification or multiple group age classification are not as numerous as those aimed at performing numeric age regression. Moreover, traditional methods in this regard reached very good results, with almost no room for improvement. However, the proposed automated methodologies kept the same high-performance level while providing a series of benefits such as economic and resource-saving [59,61].
Regarding the specific computational approaches followed by the automatic methods, most studies rely on fully automatic and deep-learning-based solutions, which lead to two key enhancements. As with every end-to-end method, the dataset only needs to be annotated with the expected output, that is, the age, reducing the time of this process and, thus, making it possible to compile bigger datasets of thousands of images. Furthermore, these methods do not rely on specific bone structures designated by an expert, but rather on the image parts that the algorithm considers to be most relevant for that specific task. As there is no need for specific teeth to be present, these methods can work even if some pieces are missing.
Although these automatic methods have been shown to improve the performance and applicability of age estimation methods, their validation needs to be improved. The relative recency of deep learning techniques causes that no automatic method has been tested in populations or acquisition devices other than the original ones, which raises doubts about their generalisation to different scenarios. However, the ease to compile new databases in this regard allows these methods to be easily adapted to different situations through the application of specific domainadaptation techniques, such as transfer learning or finetuning [80].

Conclusions
In this work, the current application of age estimation methods in recent years has been studied, specifically those methods using radiological dental images. Although classic methods have been thoroughly evaluated in many populations of different ethnicities all over the world, automatic methods based on deep learning techniques led to an improvement not only in terms of performance but also regarding the applicability in a real scenario. This represents a turning point in the field of chronological age estimation, since the speed at which the estimations can be applied can be significantly higher and the subjectivity inherent to the observer analysis can be completely avoided. Future work in this area should involve a deeper assessment of the proposed automatic methodologies, specifically their evaluation in samples of different ethnicities, to improve their generalisation capabilities.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.