WEaning Age FiNder (WEAN): a tool for estimating weaning age from stable isotope ratios of dentinal collagen

Nitrogen stable isotope ratios (δ15N) of incremental dentine collagen have been extensively used for the study of breastfeeding and weaning practices in ancient populations. The shifts in δ15N values reveal the duration of exclusive breastfeeding, the onset and completion of weaning. Despite the significant progress in sampling precision protocols, the weaning estimation is still performed by visual observation of δ15N individual profiles, a time-consuming, labor-intensive, and error-prone task. To fill this gap, we generated WEAN, a tool that enables automated estimation of weaning age based on δ15N measurements from incremental dentine collagen. WEAN generates a refined age assignment based on regression analysis and calculates the knee/elbow point of the δ15N curvature as the individual’s weaning age. We tested the accuracy of the tool by re-estimating 130 weaning ages from published datasets with the calculation of the root mean square error (RMSE). The results show a strong agreement between the visual observation and the elbow method underlining that an automatic mathematical framework can be used for the accurate estimation of weaning age. The tool can estimate the weaning age of a single or many individuals and produces visually appealing graphics (scatter and line plots) and output files. WEAN introduces a novel and robust method that streamlines the assessment of δ15N values for the exploration of breastfeeding and weaning patterns in antiquity.


Introduction
Stable isotope analysis of bone and tooth collagen has been widely used by bioarcheologists for reconstructing breastfeeding and weaning practices in past societies (Cocozza et al. 2021;Eerkens and Bartelink 2013;Eerkens et al. 2011;B. T. Fuller et al. 2006a, b). After metabolic processes, stable isotopes remain in bone or tooth collagen incorporating signatures of consumed foods (Ambrose 1990;DeNiro 1985;Fernandes et al. 2015Fernandes et al. , 2012Lee-Thorp 2008). Measured ratios especially of carbon and nitrogen discriminate against broad dietary categories (animal and marine protein, C3 and C4 plants) (Katzenberg and Waters-Rist 2018;Lee-Thorp 2008;Tykot 2004). Therefore, it is possible to determine the diet of ancient humans, bearing in mind that isotopic values may be altered by potential diagenetic effects due humic acids and additional contaminants (Ambrose 1990;DeNiro 1985;Guiry and Szpak 2020;van Klinken 1999).
Nitrogen isotopic ratios are known to increase by 5.5 ± 5‰ between consumer and consumed source (Fernandes et al. 2012), and a similar effect is observed in infants during exclusive breastfeeding. In particular, the δ 15 N values of exclusive breastfed infants are elevated by approximately 2-3‰ compared to those of their mothers' based on studies on the fingernails of modern mother-infant pairs (Fogel et al. 1989: ~ 2.4‰;Fuller et al. 2006a, b: ~ 2-3‰;Herrscher et al. 2017: ~ 2-2.8‰, (Fernández-Crespo et al. 2022). The consumption of solid foods and fluids at the onset of weaning causes the δ 15 N values to decrease throughout the weaning process due to change on the protein source, e.g. breast milk vs. animal milk (Herring et al. 1998). After the complete cessation of breast milk consumption (termed weaning age or age Elissavet Ganiatsou and Angelos Souleles contributed equally. at completion of weaning) (Dettwyler 2017;Nitsch et al. 2011), δ 15 N values fall to a similar level as the adult population (Fig. 1).
Similar to nitrogen, carbon isotope ratios (δ 13 C) are expected to be elevated during exclusive breastfeeding and decrease throughout the weaning process between < 0.5 and 1‰ (Fuller, et al. 2006a, b;Herrscher et al. 2017). However, this decrease is not as straightforward as in δ 15 N values, as δ 13 C values are more influenced by different sources of protein. By extension, their values are directly influenced by the supplementary foods offered from the first stages of weaning and increase their variability (Fuller, et al. 2006a, b). Therefore, there is less agreement amongst studies about the link between δ 13 C values and breastfeeding.
Early research on breastfeeding and weaning duration in past populations was based on the measurement of δ 15 N in bone collagen (Dupras et al. 2001;Powell et al. 2014;Rutgers et al. 2009). Sub-adult individuals whose δ 15 N values are higher than those of the adult population were assessed as still weaning infants, while those whose δ 15 N values were similar to the adult population were considered as fully weaned (Halcrow et al. 2021). However, bone collagen integrates a temporal isotopic signature that may include several years before death, which makes this an not optimal approach to address infant feeding practices (Fahy et al. 2017;Hedges et al. 2007). Furthermore, the adult population consists of different subgroups (men, women, young adults, elders, pregnant women), whose diet is not a priori homogenous (Halcrow et al. 2021). Moreover, the assessment of dietary practices using data from children who died before reaching adulthood (i.e., the "Osteological Paradox") produces further implications in drawing conclusions (DeWitte and Stojanowski 2015; Wood et al. 1992).
Yet, the new development of sequential isotopic analysis of dentine, which includes the segmentation of dentine, from crown to root into small sections, offers the potential to reconstruct diet in time-bound periods of life (Beaumont et al. 2013;Eerkens et al. 2011). As dentine forms in subsequent layers and is hardly influenced by dietary and environmental factors after its initial deposition, each section retains an isotopic signature reflecting its formation period (Dean 2017;Smith et al. 2006).
Although isotopic measurements from dentine collagen increments are far more time-accurate than bulk bone collagen measurements, assigning ages to increments is not a straightforward process. Dentine serial sectioning (DSS) includes the segmentation of dentine in horizontal sections of approximately 1-mm length (Beaumont et al. 2013). However, dentine is deposited in a sigmoid, almost conical pattern (Dean and Cole 2013). As a result, each section incorporates part of the preceding and part of the succeeding section, which widens the age range of each increment (time averaging). Recent research has shown that horizontal sectioning of dentine blurs the isotopic signal more than once thought (Tsutaya 2020). To address this, alternative sampling protocols have been developed Fernández-Crespo et al. 2018, 2020. These approaches aim to obtain minimum sample sizes, which include less dentine layers per sample either by using biopsy punches either by using image assisting sampling. These sampling methods are more sensitive to the formation of dentine, therefore provide better results in the age-at-increment assignment, and by extension describe dietary changes more accurately (Tsutaya 2020). The recent protocol of Curtis et al. (2022) offers even more promising results.
In parallel to the significant progress in sampling techniques, statistical modelling tries to address sources of uncertainty in stable isotope studies (Cocozza et al. 2021;Fernandes et al. 2015;Tsutaya 2020;Tsutaya and Yoneda 2013). OsteoBioR is a Bayesian model that employs approximate Bayesian computation (ABC) and suggests a flexible method for the age assignment of each dentine increment (Cocozza et al. 2021). Tsutaya (2020) presents a mathematical model for investigating the age range of the horizontal dentine sections (package MDSS, distributed in R) shedding light on the wider, than assumed, age range of each increment. WARN is an R package that uses ABC for estimating the beginning and the complete cessation of weaning (Tsutaya and Yoneda 2013). However, this method utilizes δ 15 N values only of sub-adult bone collagen (see limitations above).
The abovementioned studies pursue the aim of increasing the temporal resolution of the isotopic analysis and reconstructing intra-individual dietary changes across childhood. However, albeit infant feeding practices are often investigated using incremental dentine stable isotope analysis, the estimation of weaning onset and completion are performed visually (Ganiatsou et al. 2022;Lee et al. 2020;Scharlotta et al. 2018).
The visual evaluation of each isotopic plot requires skill and expertise, which by extension require time and labor investment when analyzing many isotopic profiles at a time. Furthermore, this procedure enfolds the subjective judgment and the inability to compare isotopic profiles produced by different protocols. To facilitate and standardize this process, we propose a streamlined procedure of weaning age estimation based on mathematical computation for which we developed the automatic tool WEAN. The tool estimates the weaning age mathematically and illustrates the results in visually appealing charts and user-friendly output files in PNG and CSV format.

WEAN overview
WEAN is built in Python (code available on GitHub) and is freely distributed (see Code availability). The tool provides weaning age estimations and descriptives for one or many individuals. The user needs to provide a CSV file with the age-at-increment assignment of each δ 15 N measurement. The tool necessitates at least five measurements per individual based on the axiom of quartic polynomial equations of fourth degree but has no limit on the maximum number of measurements (Supplementary Material). The file can be easily imported, and the user can select the individual(s) they want to analyze through a drop-down list from a friendly user interface (Fig. 2). The results obtained from WEAN can be easily exported in CSV format and include: the weaning age estimation with the elbow method, the R-squared values showing the fitness of the regression model, and the derivative at the weaning age as well as the calculated values used for the trophic level effect assessment (for details see Table 1). Individual and dataset charts are automatically produced and can be downloaded in PNG format (Fig. 2).

Mathematical background of WEAN
The mathematical base of WEAN lies upon the estimation of the elbow/knee point of a curve. In mathematical terms, the point of a curvature, where the curve visibly angles, is defined as the knee/elbow point. Bearing in mind that reference incremental dentine δ 15 N values associated with breastfeeding show a gradually declining pattern, which may stabilize or increase soon after breastfeeding is no longer practiced, the completion of weaning process is assessed as the point where the isotopic curvature visibly bends. The automatic weaning age estimation of WEAN is based on the calculation of the elbow point.
The elbow point estimation is broadly used in science, and there are many ways for its calculation (Satopää et al. 2011). In this study, the elbow is calculated by determining the point of the curve with the greatest perpendicular distance from a hypothetical line that connects the first and last observation of the curve (Fig. 3). We assumed that there is a line (M) that connects the first (A 0 ) and the last observation (A 1 ) of the curve with the following coordinates A 0 (age max_Nitrogen , max_Nitrogen) and A 1 (age last_Nitrogen , last_Nitrogen) (Fig. 3). Then, we calculated the perpendicular distance (d) of each measurement (n) belonging to the curve. Based on the above calculations, the elbow point has the maximal distance d (d max , marked in a pink circle in Fig. 3).
Besides the application of the elbow point method, we have considered additional parameters for optimizing the accuracy of our estimations and the functionality of the tool.

A. Trophic level effect related to breastfeeding
The tool calculates the decrease in δ 15 Ν between exclusive breastfeeding and the age at completion of weaning, to assess if this difference falls within one trophic level difference related to breastfeeding (see Introduction and Supplementary Material). The mathematical workflow of this step is schematically presented and explained in detail in the Supplementary Material. To facilitate the user, WEAN plots individual(s) according to the degree of decrease in δ 15 Ν values.

B. Age-at-increment refinement
To account for the uncertainty of time averaging (Tsutaya 2020), WEAN primarily performs linear regression on the assigned, by the user, age at increment. This enables the refinement of the assigned ages for each individual. The regression is performed based on fourth-degree polynomial equations (Table S2, Supplementary Material).
III. Evaluation of the δ 15 N pattern Some individuals may show a stepwise increase instead of a decrease in their δ 15 N values (Scharlotta et al. 2018). Isotopic studies on malnourished children (Beaumont et al. 2016) indicate that individuals who suffer from malnutrition are characterized by increasing δ 15 N values and falling δ 13 C values (Beaumont and Montgomery 2016;Mekota et al. 2006). Although this pattern is not indicative of breastfeeding, it could be troublesome for the tool. To account for this variability, the tool calculates the derivative of the δ 15 N value of the weaning age to determine its inclination. The derivative of a function shows the slope of its tangent line at a specific point. For functions that have a decreasing tendency, the derivative takes negative values (f'(x) < 0); for functions with an increasing tendency, the derivative takes a positive value f'(x) > 0. In our model, negative derivatives indicate that δ 15 N values have a decreasing tendency at the weaning age point indicative of breastfeeding. In contrast, positive derivative values are indicative of no breastfeeding. This parameter identifies the presence or not of breastfeeding pattern and prohibits false assignments of weaning age (Table S2, Supplementary Material).
The mathematical workflow of this step is schematically presented in Fig. 4 and explained in detail in the Supplementary Material. To facilitate the user, WEAN plots individual(s) according to the degree of decrease in δ 15 Ν values (Fig. 4).

Individual
The given ID of the individual Weaning age The weaning age estimated from the elbow point of ages modeled with fourth-degree polynomial regression.

Observed weaning age
The weaning age estimated from the elbow point based on the age assignment of the observer (without performing regression).

Weaning age Z-score
The numerical measurement that describes the deviation of the weaning age of one individual to the mean weaning age of the population. R-squared R-squared value of the regression used as a measure of goodness-of-fit.

Derivative
The calculated derivative at the weaning age showing the inclination of the δ 15 N curve. This parameter indicates whether function increases or decreases at and points to possible false weaning age estimation.

Mean difference
The mean δ 15 N difference between the observed and estimated weaning ages.

Std.Dev
The standard deviation as a measure of statistical dispersion of the observed δ 15 N and estimated δ 15 N difference.

Observed δ 15 Ν difference
The result of the subtraction of the δ 15 N value of the estimated weaning age (elbow method) from the maximum observed value (without performing regression).

Expected δ 15 N difference
The result of the subtraction of the δ 15 N value of the estimated weaning age (elbow method) from the maximum estimated value (modelled with fourth-degree polynomial regression). Mean post-weaning δ 15 N Mean value of the δ 15 N measurements from the estimated weaning age onwards. Given that the number of measurements per individual may vary depending on the tooth type, the root, and increment length (Tsutaya 2020), and this influences the regression modeling (Harrell 2017), we determined the number of δ 15 N measurements per individual that the tool requires for producing reliable estimations. Using a bootstrap-like method, we generated new sets of observations out of the existing measurements of the datasets. Specifically, we sampled 1000 sets, varying in the number of observations, using random δ 15 N measurements from each of the 130 individuals, e.g., 1000 iterations for five random measurements from one individual and 1000 iterations for six random measurements from the same individual, and repeated this process until there was no other combination We then calculated the R-squared and the mean R-squared value for all sets to determine the optimal number of δ 15 Ν measurements per individual for producing reliable results ( Figure S4, Supplementary Material).

The test datasets
To test the accuracy of our model, we re-estimated 130 weaning ages from published datasets (Table 2) and compared the results with the initial weaning age estimation. The initial estimations were performed visually based on isotopic profiles produced from deciduous first and second molars and first permanent molars. In the dataset of Scharlotta et al. (2018), there were four individuals with two weaning ages as the authors supported a possible multiphase weaning procedure. We used the earliest recorded age in this study, since Tsutaya's study showed that weaning ages estimated from parallel dentine sections are usually later than the true ones (Tsutaya 2020). The purpose of this measure was to mitigate the artifact associated with time averaging in dentine increments.
We should note that we included all 130 individuals despite the fact that in some cases there was no weaning age estimated by the observers as the δ 15 Ν profiles were not indicative of breastfeeding (n = 14). Our aim was to test the accuracy of the tool in identifying these cases. A summary of the datasets can be found in Table 1.

Statistical analysis
For comparing the published and the newly estimated weaning ages, we calculated the root mean square error (RMSE), which shows their difference in years, e.g. if the weaning age was observed visually at 2 years of age and WEAN estimates it at 3 years, the RMSE would obtain a value of one. To calculate the RMSE, we used the following formula (1) Furthermore, we compared the two methods with the Bland-Altman plot (Altman and Bland 1983;Bland and Altman 2010). This chart shows the agreement between the measurements of the two methods by plotting their difference against their mean difference. The Bland-Altman method is more suited in cases of method comparison instead of correlation analysis or other inferential statistic models, i.e., t-test and ANOVA (Mansournia et al. 2021), and more informative than their direct comparison (Bland and Altman 2010).
Ninety-four out of 130 individuals had a decreasing trend in δ 15 N, thus acquiring a negative derivative and a mean RMSE of 0.88 (range 0-4.7 years). Two individuals (B_65, B4, Table S1) out of the 94 had the highest RMSE (4.6 and 4.7 respectively). The high RMSE value derives from the missing δ 15 Ν measurements at the beginning of their life, whereas the first δ 15 Ν measurement starts at the age of 2.4 years old (median age). The remaining 25 individuals with positive derivatives show flat isotopic profiles or their first δ 15 Ν measurement starts during the first year of life or later and not at birth. Figure 5a shows the newly estimated vs. the published visually observed weaning ages (n = 119). The measurements lie close to the line of equality (diagonal line) implying the agreement between the two methods. The Bland-Altman plot (Figure 5b) shows a strong agreement between the two methods because the observations cluster close to zero and within the lines of agreement (Altman and Bland 1983;Mansournia et al. 2021). In both charts of Figure 5b, there are nine estimations (B_58.1, B_71, T030Ind5, T028Ind2, B_65, T010Ind2, B4, METi_119, METi_125; see Table S1) that differ significantly from the rest. All nine individuals obtained a positive derivative, apart from T030Ind5, T028Ind2, B_65, T010Ind2, and B_65, whose observations start from the age of two and will be discussed further.

Discussion
The results show a strong correlation between the visual and the elbow method underlining that a mathematical framework can be accurately applied in weaning age estimation. The majority of examined individuals obtained a weaning age, which was calculated automatically and did not differ from the visual observation. This indicates that despite the observed variability in δ 15 Ν individual profiles, a statistical and automated approach can be established. False weaning estimations indicated by positive derivatives were obtained from individuals with missing ages (e.g., B_42.1, B_76, B_93.2) and individuals with no variation in their δ 15 Ν measurements (flat isotopic profiles). The only exception is individual B_65 (Scharlota et al. 2018), for which the tool estimated a weaning age even though the initial ages are missing and did not predict the potential false estimation. Specifically, in this individual, initial dentine layers were missing due to wear; therefore, the δ 15 N measurements correspond from the age of 2.4 years onwards. However, the δ 15 N profile shows a decreasing pattern which led the authors to suggest that the weaning period would have ended around the age of two (Scharlotta et al. 2018). Considering that the individual's isotopic pattern closely resembles a breastfeeding motif, WEAN estimated a weaning age at 6.55 years of age without predicting a false estimation. This is an excellent example of an outlier case based on which, we recommend the users to provide ages that initiate around birth, as decreasing trends in δ 15 N during later life are most likely linked to other factors such as dietary changes or physiological stress and not to weaning. Therefore, individuals with many missing values esp. the first ones should be treated with caution.
Overall, apart from the profiles that did not initiate from birth, some of the weaning age estimations with positive derivatives were individuals with flat δ 15 Ν profiles. Since the elbow method locates the point where the curve bends and flat isotopic profiles are characterized by barely noticeable changes, WEAN correctly could not assign a weaning age. Similarly, in the original publications, e.g. METi_125 (Ganiatsou et al. 2022) and Cam9 T32c1 (King  et al. 2018), the authors did not assign a weaning age verifying the quality of our observations. Based on the results, the estimations of WEAN are more accurate for isotopic profiles obtained from M1. Since WEAN uses the principle of the elbow method, it functions more efficiently with isotopic profiles that gradually decrease and stabilize. Since this pattern is more frequently observed in measurements from M1, this tooth type offers the most reliable weaning estimation with the proposed methodology (Results and Supplementary Material) and we suggest M1 as the optimal tooth type for breastfeeding reconstruction with WEAN. The estimation of weaning age from dm1 and dm2 with the elbow method is not as accurate as M1 despite the potential of dm1 to reveal the mother-infant dyad since collagen from dm1 captures dietary shifts in utero and early life.
In most cases, the estimated weaning ages with WEAN are earlier than the published ones (Table S1) because age data are smoothed using regression and the maximal reduction occurs at younger ages. Considering that the true ages of dentine increments are earlier than the calculated ones (Tsutaya 2020), this shows that performing regression on age data prior to weaning age estimation has the potential to provide more reliable results than raw calculations. Although Tsutaya's study and this research differ in terms of scope and methodology, this line of evidence indicates that the tool's estimates are appropriately reliable and that performing regression prior to any assessment is an efficient method to overcome time averaging uncertainties due to methodological issues in general.
We are aware that our approach has some limitations. First, WEAN presupposes that each δ 15 N measurement is assigned to a specific age. Therefore, the accuracy of the estimation is partially determined by the provided age-atincrement assignment, which by extension depends on the protocol of dentine sectioning. Tsutaya (2020) has shown that time averaging in increments obtained with different sampling protocols differs and supported that sampling protocols that isolate smaller samples Fernández-Crespo et al. 2018) are more efficient in weaning reconstructions compared to horizontal sectioning of dentine. WEAN can be applied independently of the sectioning protocol, as we have shown in the present study, and accounts for errors in time averaging of increments with regression. However, a carry-on error based on the sampling protocol should be additionally considered. The evaluation of weaning ages obtained with different protocols is not performed in this study; nonetheless, bearing in mind the results of Tsutaya, it is possible that microsampling protocols Fernández-Crespo et al. 2018, 2020 provide more accurate observations for WEAN.
Second, the midpoint age assigned to each δ 15 N measurement represents in reality an age range. The results obtained from WEAN are reported in absolute numbers; however, it is critical to realize that the estimations do not represent timespecific events in the life of individuals. It is not our intent to oversimplify the estimation of the weaning process but to refine and automate an existing method performed manually by researchers. Thus, the weaning age estimations of WEAN should not be considered as absolutes. WEAN can identify mathematically the age that breastmilk consumption can no longer be detected based on isotopic measurements. However, weaning is a process, during which several metabolic processes occur in order to convert dietary changes in isotopic ratios. The isotopic discrimination between the organism and the diet (i.e., the trophic shift) varies between individuals according to their metabolic state (Fuller et al. 2005;Mekota et al. 2006). This may be depicted as independent changes in δ 13 C and δ 15 N values or decreases in δ 15 N and occasionally increases in δ 13 C values (Fuller et al. 2005;Mekota et al. 2006). Physiological stress during infancy may also implicate interpretations of isotopic ratios. For example, the degree of decrease in δ 15 N between breastfeeding and weaning age was used to explore underlying nutritional stress in some studies (Beaumont and Montgomery 2016;Ganiatsou et al. 2022;King et al. 2018). However, particular attention has recently been paid to examining physiological stress through isotopic patterns of opposing covariance. In these isotopic profiles, δ 15 N values increase as a result of protein catabolism and δ 13 C values decrease as a result of lipid catabolism (Craig-Atkins et al. 2018;Kendall et al. 2021). Up to now, the decrease in δ 15 N and δ 13 C values, between 2 and 3‰ and 0.5 and 1‰ respectively (see Introduction), during early life is accepted to reflect the weaning process (Craig-Atkins et al. 2018;Teresa Fernández-Crespo et al. 2022). A potential future optimization of the tool is the estimation of the weaning ages reported in ranges and the utilization of parameters that differentiate dietary and physiological effects in isotopic measurements.

Conclusions
Stable isotope ratios of nitrogen have proven a reliable method to infer the weaning behavior of past populations. The accuracy of stable isotope ratios of incremental dentine collagen offers an aggregated estimation of dietary habits in early life. However, up to now, a standardized methodology for the weaning age estimation is not available. WEAN introduces an approach to estimate the weaning age accurately and efficiently which is now performed with the same computational method enabling the comparison of different isotopic profiles. Our research intends to facilitate the analysis of δ 15 N incremental data offering a less time-consuming and labor-intensive procedure through an easy-to-use operational environment.