Predicting the spatial distribution of stable isotopes in precipitation using a machine learning approach: a comparative assessment of random forest variants

Erdélyi, Dániel; Kern, Zoltán; Nyitrai, Tamás; Hatvani, István Gábor

doi:10.1007/s13137-023-00224-x

Predicting the spatial distribution of stable isotopes in precipitation using a machine learning approach: a comparative assessment of random forest variants

Original Paper
Open access
Published: 12 June 2023

Volume 14, article number 14, (2023)
Cite this article

Download PDF

You have full access to this open access article

GEM - International Journal on Geomathematics Aims and scope Submit manuscript

Predicting the spatial distribution of stable isotopes in precipitation using a machine learning approach: a comparative assessment of random forest variants

Download PDF

1878 Accesses
3 Citations
Explore all metrics

Abstract

Stable isotopes of hydrogen and oxygen are important natural tracers with a wide variety of environmental applications (e.g., the exploration of the water cycle, ecology and food authenticity). The spatially explicit predictions of their variations are obtained through various interpolation techniques. In the present work, a classical random forest (RF) and two of its variants were applied. RF and a random forest version employing buffer distance (RF_sp) were applied to each month separately, while a random forest model was trained using all data employing month and year as categorical variables (RF_tg). Their performance in predicting the spatial variability of precipitation stable oxygen isotope values for 2008–2017 across Europe was compared. In addition, a comparison was made with a publicly available alternative machine learning model which employs extreme gradient boosting. Input data was retrieved from the Global Network of Isotopes in Precipitation (GNIP; no. of stations: 144) and other national datasets (no. of stations: 127). Comparisons were made on the basis of absolute differences, median, mean absolute error and Lin’s concordance correlation coefficient. All variants were capable of reproducing the overall trends and seasonal patterns over time of precipitation stable isotope variability measured at each chosen validation site across Europe. The most important predictors were latitude in the case of the RF, and meteorological variables (vapor pressure, saturation vapor pressure, and temperature) in the case of the RF_sp and RF_tg models. Diurnal temperature range had the weakest predictive power in every case. In conclusion, it may be stated that with the merged dataset, combining GNIP and other national datasets, RF_sp yielded the smallest mean absolute error 1.345‰) and highest Lin’s concordance correlation coefficient (0.987), while with extreme gradient boosting (based on only the GNIP data) the mean absolute error was 1.354‰, and Lin’s concordance correlation coefficient was 0.984, although it produced the lowers overall median value (1.113‰), while RF_sp produced 1.124‰. The most striking systematic bias was observed in the summer season in the northern validation stations; this, however, diminished from 2014 onward, the point after which stations beyond 55° N are available in the training set.

On the potential of mapping sea level anomalies from satellite altimetry with Random Forest Regression

Article Open access 27 February 2023

A high-resolution daily gridded meteorological dataset for Serbia made by Random Forest Spatial Interpolation

Article Open access 30 April 2021

Bias correction of global ensemble precipitation forecasts by Random Forest method

Article 09 February 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The ratio between the heavy and light stable isotopes in the water molecule (¹⁸O/¹⁶O; ²H/¹H) is an effective tool in resolving practical problems in environmental isotope geochemistry that arise in the disciplines of hydrology, climatology, biogeochemistry etc. (Coplen et al. 2000). The stable isotope composition of hydrogen and oxygen is conventionally expressed as δ values (δ²H and δ¹⁸O respectively) reported in units per mille (‰) (Coplen 1994). The stable isotopic composition of hydrogen and oxygen in precipitation (δ_p) provides an insight into the origin of water vapor, the conditions prevailing during condensation and precipitation (Aggarwal et al. 2016; Dansgaard 1964). Exploiting these variations, water (precipitation) stable isotopes have become important natural tracers in the study of the water cycle (Bowen and Good 2015; Fórizs 2003). With the continuous advancement in effectiveness and availability of analytical tools, the spatiotemporal abundance of precipitation stable isotope measurements is steadily increasing (Wassenaar et al. 2021; Yoshimura 2015), providing sufficient ground for the development of spatially continuous datasets of δ_p variability. These datasets can be utilized for advanced hydrological applications of precipitation stable isotopes where additional information can be gained from not only having point data, but spatially continuous information. Such applications can be found in hydrogeology (Bowen and Good 2015; Clark and Fritz 1997), limnology (Birkel et al. 2018; Nan et al. 2019), water resource management (Bowen and Good 2015; Gibson and Edwards 2002), the exploration of changes in moisture source conditions (Amundson et al. 1996), animal migration studies (Hobson 1999; Hobson and Wassenaar 1996), food source traceability (Heaton et al. 2008), and forensic sciences (Ehleringer et al. 2008) as well.

In generating spatially continuous datasets of isotopic variability, traditional kriging interpolation was demonstrated to outperform other approaches right from its inception (Bowen and Revenaugh 2003) and has since become the gold standard for mapping the spatial variability of precipitation isotopes on scales ranging from the global (e.g. Bowen 2010; Terzer et al. 2013) to the regional; e.g. Hatvani et al. (2020), Hatvani et al. (2017), Kaseke et al. (2016), Kern et al. (2014). In recent years, machine learning (ML) approaches (artificial neural networks, support vector machine and similar techniques) have increasingly proven their worth in extracting patterns and insights from the ever-increasing stream of geospatial data (Gopal 2016; Szatmári and Pásztor 2019), and have opened new perspectives on the understanding of data-driven Earth system science problems (Reichstein et al. 2019). Although the number of ML applications employed in deriving isoscapes is still scarce, their number is steadily increasing in fields as diverse as studies on bioavailable strontium (Bataille et al. 2020; Funck et al. 2021), on sulfur in human remains (Bataille et al. 2021), the stable isotope composition of nitrogen and carbon in particulate matter from the Northwest Atlantic Continental Shelf (Oczkowski et al. 2016), and studies on the isotopic composition of shallow groundwater (Stahl et al. 2020).

There have been comparative studies on the performance of different techniques in relation to predicting and mapping geochemical parameters (including stable isotopes), a task in which ML turned out to perform as well as combined geostatistical tools (Hengl et al. 2018), or even better than them (Bataille et al. 2018; Li et al. 2011). Specifically, a methodological comparison between a regression kriging approach and Random forest (RF) algorithm using daily and monthly δ_p datasets from European subregions showed that the ML approach is capable of outperforming the traditional methodology (Erdélyi et al. 2023). The first ever ML algorithm used in estimating the spatial variability of monthly δ_p in Europe was released recently (Nelson et al. 2021), employing extreme gradient boosting (XGBoost) algorithm called PisoAI that uses the Global Network for Isotopes in Precipitation (GNIP (IAEA 2019)) dataset. The performance of the different ML methods in predicting a spatially continuous dataset of monthly δ_p has not yet been compared. In addition, it is clear that there is still much to be learned about the behavior of ML tools in estimating the spatial variability of precipitation stable isotopes, and the difficulties and problems originating from the different spatiotemporal resolutions and predictors.

The aim of the present work is to explore the stable isotopic variability of precipitation across the European continent with the use of three variants of random forest algorithms. Specifically, to compare the performance of random forest approaches with the latest ML derived precipitation δ¹⁸O isoscape for Europe. In this task, not only the GNIP, but other networks’ data is used increasing the volume of the data by ~ 33%; which is in contrast to the usual approach in subcontinental precipitation stable isotope modeling.

2 Materials and methods

2.1 Data used

2.1.1 Precipitation isotopic data and preprocessing

Monthly δ_p values were acquired from a total of 480 monitoring stations operating in Europe and its vicinity between 1960 and 2021 (Fig. 1), primarily from the Global Network of Isotopes in Precipitation (GNIP), extended by the inclusion of 161 stations from regional networks and individual records; see Electronic supplementary material for details and the corresponding references. In general, the data gathered were unequally distributed in space (Fig. 1A) and time (Fig. 1B,C). Until the mid-1970s, the number of active precipitation isotope monitoring stations in Europe was limited, and those primarily belonged to the GNIP (n < 20). Afterwards, their number rose progressively, owing to the growth of the Austrian Network of Isotopes in Precipitation (ANIP) (Kralik et al. 2003), reaching about 80 stations for oxygen- (δ¹⁸O) (Fig. 1B), and 79 stations for hydrogen (δ²H) (Fig. 1C) precipitation stable isotopes by the turn of the millennium. A period with a relatively high abundance of active stations was observed in the early 2000s. For instance, in 2001 the number of active stations peaked at around 155 stations for δ¹⁸O and 137 for δ²H (Fig. 1B,C). However, the spatial distribution of the monitoring sites does not provide representative coverage of the continent in this period, being rather heavily concentrated in the Mediterranean region, owing to the coordinated research project of IAEA (IAEA 2005). Therefore, to maximize the spatiotemporal coverage of the δ_p values available from the study region, a focus period of 2008–2017 was chosen. This period was characterized by an annual average number of stations ≥ 125 providing data in any given year (Fig. 1B,C).

Since δ¹⁸O and δ²H are highly correlated in meteoric waters (Craig 1961; Rozanski et al. 1993) and the δ²H data are fewer in the collected dataset (Fig. 1B,C), precipitation δ¹⁸O (δ¹⁸O_p) was used throughout the experiments presented here; the results, however, should be valid for both. If stations from different data sources (e.g. GNIP, ANIP, literature) were situated close to each other (within 3 km) and had parallel measurements with a difference of only < 0.05‰ for a given month, the GNIP data were discarded. For data preprocessing, local Moran statistics and deuterium-excess were considered in a way similar to Erdélyi et al. (2023). In the screening process no obvious outliers were found, therefore, no data were excluded.

2.1.2 Potential predictors of spatial variability of precipitation stable isotopes across Europe

Various environmental parameters are known to control the spatial variability of δ_p on continental scales (Dansgaard 1964; Rozanski et al. 1993). Physical factors affecting the transport and fractionation processes within the global hydrologic cycle that can be represented by the geographical position of a given location are known to be important drivers of δ_p (Bowen and Revenaugh 2003; Dansgaard 1964; Rozanski et al. 1993). In the present study latitude, longitude are in WGS84 coordinate system (EPSG 4326), elevation derived from the digital elevation model obtained from Amazon Terrain Tiles (zoom parameter 5: ground resolution ~ 3 km in average)(AWS 2021) were used. These geographical variables are, however, mere reflections of the actual drivers regulating the atmospheric mixing and fractionating processes, and these are, after all, environmental mechanisms which can be considered via physical/meteorological parameters. A wide variety of meteorological parameters influencing δ_p spatial variability were considered: vapor pressure, monthly mean temperature, monthly average of the diurnal temperature range, monthly precipitation amount, all of which were derived from the CRU TS4.05 0.5 × 0.5° resolution dataset (Harris et al. 2020), and saturation vapor pressure calculated (Murray 1966) from the CRU monthly mean temperature.

In addition, the distance matrix of the monitoring stations providing the training data, following the procedure of Hengl et al. (2018) were also included as a possible predictor. Specifically, the set of distances of a station from the others will serve as a predictor (called buffer distance (Hengl et al. 2018)) instead of its spatial coordinates (latitude and longitude), and this will be done for all the stations. Categorical variables were also included as the Köppen-Geiger climate region’s codes (KG; resolution 0.5 × 0.5°) (Kottek 2006), which reflect the climatic effects in a complex and aggregated way and have been used as such in other studies—e.g. (Heydarizad et al. 2021; Nelson et al. 2021)—and the month and year of a given measurement to reflect the well-known seasonality of δ_p (Feng et al. 2009). The geographical and meteorological predictors had been extracted to the grid cells locations using nearest neighbour resampling technique.

2.2 Methodology

2.2.1 Random forest

To predict the spatial variability of δ_p across Europe, random forest (RF) was used. Random forest is a non-parametric ML algorithm which assesses a combination of predictors, and in which each tree depends on the values of a random vector sampled independently, with the same distribution for all trees in the forest (Breiman 2001; Cutler et al. 2012). The generalization error of a forest is dependent on the strength of the individual regression trees in a forest, it converges to a limit as the number of trees in the forest increases (Breiman 2001). The forest’s predictions are based on the average results of the decision trees, which use bootstrap sampling to decrease the possibility of over-fitting; for details see e.g. Biau and Scornet (2016); Breiman (2001); Prasad et al. (2006). The approach was chosen, since it has been successfully applied to map isotopic parameters (e.g. Bataille et al. (2018)) and other environmental variables (Hengl et al. 2018) or predicting the biological status of water bodies (Szomolányi and Clement 2023).

Two alternative versions of random forest were employed in the study regarding the set of predictors for the focus period (2008–2017). In the RF approach, all predictors were included except for the distances between the grid cells and the monitoring stations, while in the so-called RF_sp (Hengl et al. 2018) the distances of the monitoring stations from the grid cells were also used as predictors. For these two models (RF and RF_sp), random forest models were calculated separately for each monthly dataset, and those models were used to make the predictions.

In the case of another variant of random forest, the whole dataset (all monthly data together from 2008 to 2017) is used, with the years and months as additional categorical predictors, as the training data from which to derive the model. Following this, the monthly isoscapes are predicted (marked as RF_tg). In the meanwhile, in the case of both RF and the RF_sp, the training and prediction is performed on each and every monthly dataset separately. In all random forest approaches the so-called ‘variable importance’ sequence is obtained, representing the relative importance of each predictor in a random forest model.

In the present case the randomForestSRC (Ishwaran et al. 2021) and ranger (Wright and Ziegler 2015) R (R Core Team 2019) packages were used to obtain the RF and RF_sp models, respectively. The specific parameter values for e.g., nodesize and mtry were determined using the tuneRanger() function following Probst et al. (2019) using the out-of-bag observations to evaluate the trained algorithm (Breiman 1996).

Evaluation of our model results consisted of out-of-sample verification, and comparison to another ML-based spatial prediction of δ¹⁸O_p variability across Europe (Nelson et al. 2021). To see the possible effect of a larger dataset in space, training was performed on (i) the GNIP data only—overlapping with the only available ML alternative for δ_p predictions in Europe (Nelson et al. 2021)—and (ii) the entire collected dataset (Fig. 1A, B).

When evaluating the importance of the predictors used, variable importance plots were derived using mean decrease in impurity (Gini index; Breiman (2017)), which shows each variable’s selection frequency for a split during the building of a tree (Szomolányi and Clement 2023). The variable importance plots were plotted in such a way that their information would be rendered comparable to the various random forest approaches. Therefore, all the weights were converted to percentages, and only the first 11 predictors’ explanatory power was evaluated (Sect. 3, Fig. 2).

2.2.2 Validation of the random forest monthly models of δ _p variability

For validation purposes, 10% of the available data (1582 cases) were retained from each month with Basel and Rovaniemi (Figs. 3 and 4) always included, since these two were used for validation in the development of the PisoAI model (Nelson et al. 2021). This semi-random selection resulted in 14 validation stations overall (Figs. 3 and 4). It was ensured that the chosen test stations would not be closer to each other than 60 km, to avoid clustering of the validation points, and required that the amount of missing monthly data was below 25% of the total time considered (i.e., between 2008 and 2017).

In the validation process numerous statistics were used. Absolute differences were calculated to explore the performance of the random forest approaches and compare it to actual measured δ_p values and the PisoAI model. In all cases these were obtained as the difference of the measured value of a validation station and its nearest grid cells values to where the different ML models predicted their results. The median of the absolute differences (MAD) and the mean absolute errors (MAE) were calculated to compare the model performances. MAE measures the average absolute difference between the predicted values and the actual values in a dataset. It is a commonly used error statistic instead of, for example, root mean square error, when the error distribution is not expected to be Gaussian (Chai and Draxler 2014), as in the present case (W(1529) = 0.882, p < 0.001;(Shapiro and Wilk 1965)). Also, MAE is suggested to avoid the amplification of the higher errors compared to the lower ones (Willmott and Matsuura 2005).

Additionally, the Lin's concordance correlation coefficient (Lin’s CCC, Lawrence 1989) a hybrid error metric was included. The CCC combines measures of both precision and accuracy to determine how far the predicted and measured data pairs deviate from the line of perfect concordance (y = x line). Lin’s CCC ranges from − 1 to 1, with a perfect concordance at 1. Lin’s CCC was calculated using the CCC() function of the DescTools package, (Signorell et al. 2019).

3 Results and discussion

First, the importance of the applied predictors is shown and evaluated, followed by the assessment of the performance in time. As a last step, the overall performance of the different models is compared and discussed.

3.1 Importance of predictors

When a random forest model is built, out-of-bag (OOB) samples that are not used during the construction are attributed to all the trees of the model. These OOB samples are then used to calculate the importance of a specific variable, in the present case, this means to get an insight into the controlling mechanisms of the spatial variability of precipitation stable isotopes. There is no change in the order of the predictor importance among the most powerful ones, and there is only a slight change in the lower importance ranks (Fig. 2). Only the power of KG variable drops explicitly when the amount of data is increased (Fig. 2A–C vs. D–F).

In the present case, the most important predictors were the meteorological variables (vapor pressure, saturation vapor pressure, and temperature) in the complex random forest models (RF_tg and RF_sp), as has been shown in the case of the PisoAI model using the XGBoost algorithm (Nelson et al. 2021). Diurnal temperature range, however, had the weakest predictive power in every case. This was true regardless of whether only the GNIP data (Fig. 2A–C) or all the available data (Fig. 2D–F) were used.

The most striking difference between the more powerful and less powerful predictors is seen in the case of the RF_tg model (Fig. 2B, E), while this is (considerably) less so in the case of the classical RF model. Of the geographical variables, latitude was in every model among the most important drivers (Fig. 2). In the RF model it was the leading one (Fig. 2A,D), concurring with past studies in which latitude was the crucial predictor (Bowen and Revenaugh 2003) or was again amongst the most important ones (Hatvani et al. 2017; Liu et al. 2008; Terzer-Wassmuth et al. 2021; Terzer et al. 2013). In the RF_sp model, the fourth most important predictor is S_438, which is one of the training station’s (Kuopio) distance to the grid points. There is no change in the order of the predictor importance among the most powerful ones, and there is only a slight change in the ranks of lower importance (Fig. 2).

3.2 Performances evaluated in time

The results unanimously indicate that all random forest variants are capable of reproducing the overall trends and seasonal patterns of measured precipitation stable isotope variability at each validation site across Europe. This—namely, capturing the amplitude of seasonal variations in δ¹⁸O_p—however, had been a continuous challenge for previous regression based geospatial models, e.g. Cluett and Thomas (2020), Daniels et al. (2017), Shi et al. (2020), and in this respect ML approaches seem to outperform their regression-based counterparts (Nelson et al. 2021).

The predicted time series of δ¹⁸O_p values from the validation sites (Fig. 3) showed that the random forest models generally tracked the seasonal variability of the measured values well in almost every case. A relatively small variance in δ¹⁸O_p is seen at the coastal locations (Fig. 3A,B; for localities see Fig. 4C), while more pronounced seasonal cycles prevail inland (Fig. 3C–N; for localities see Fig. 4C). There is no single model that could be considered to outperform the others at most of the stations.

Generally, the r² values ranged from 0.26 (Cestas—Fig. 3B) to 0.87 (Basel—Fig. 3F). Spatial clustering can be observed in the case of the linear relationships between the validation stations’ time series and the predicted ones. The weakest correlations (0.26 < r² < 0.7; median r² = 0.5) were seen in the case of the SW stations (A Coruña, Cestas, Tortosa, Girona; Fig. 3A–D), while higher ones (0.66 < r² < 0.87; median r² = 0.78) characterized the Central European validation stations (Fig. 3E–L). Somewhat weaker r² values were recorded for the northernmost validation stations (Fig. 3M,N) ranging between 0.64 and 0.74 with a median of 0.69. When comparing the r² values between the different models, it becomes clear that in the case of the SW validation stations the PisoAI produced the best estimations (Fig. 3A–D). Regarding the Central European stations, the RF_sp gave the best predictions in four- and the PisoAI in three out of the eight cases (Fig. 3E–L). For the northernmost stations: at Espoo the PisoAI (r² = 0.85; Fig. 3M) while at Rovaniemi the RF approach gave the highest correlations (r² = 0.68; Fig. 3N).

Intra-annual/seasonal variability is generally well reproduced, for example, even if a winter month is less negative than preceding and following one(s), the models were in fact able to mimic the sub-seasonal pattern, e.g., 2012 Nyon, 2016/2017 Espoo. Similar accuracy was seen in case of even smaller variations in the summer months of 2014 and 2015 at St. Gallen, as well, for example.

Regarding the interannual/interseasonal differences, if a winter was more negative than the ones generally occurring at a particular station, the models still gave quite accurate predictions. For example, in Nyon, in the winter of 2014/2015, the relative difference between the δ¹⁸O_p negative peaks (near − 5‰) was well reproduced compared to the following winter season, as also in the case of the Carpathian monitoring station (Liptovsky Mikulas; Fig. 3L). There, in the Decembers of 2010 and 2012, δ¹⁸O_p < − 18‰, while in early winter 2016 neither the observed, nor did the modelled δ¹⁸O_p drop below − 14‰, all of which again highlights the estimation power of the ML tools for δ¹⁸O_p. A summer example can be drawn from Basel and Graz, where in 2010 and 2013 the summer peak was below − 5‰, while in 2015 it was above − 5‰, and the models were able to reproduce the interseasonal differences.

Despite the general and reasonable agreement between the observed and modelled time series of δ¹⁸O_p, deviations are also observable. These will be elaborated on in the following order: (i) systematic misfits, (ii) multimothly-periods, (iii) monthly anomalies/single peaks.

The most obvious systematic misfit can be observed in summer at the northern validation stations (Fig. 3M, N). The Random Forest algorithms predicted more positive values (the largest errors > 4‰) for the summer month of 2009. The degree of misfit seems to decrease from 2014 (Fig. 3M, N). The improvement of the predictions, especially at Espoo, may be supposed to be a benefit of the entering of data from closer locations to the training set. Since the contribution to the GNIP database from two Estonian stations (Tartu, Vilsandi) started in 2013. Another season-specific bias can be seen in A Coruña (Fig. 3A), where the Random Forest prediction underestimates the observed δ¹⁸O_p values. The largest errors, > 7‰, appear for the summer months of 2011 and 2015.

Another type of problem is when multimothly-periods with generally poor estimations occur. For example, in Cestas from late 2013 to early 2014, all models fail to match the measured data (Fig. 3B). However, the measured data itself displays an unusual, weird pattern, showing values for winter which are less negative and ones for summer which are more negative, contradicting the expected seasonality for a midlatitude location (Feng et al. 2009; Rozanski et al. 1993) and documented for nearby French stations (Zhang et al. 2020) even in this particular interval. This raises concerns about the reliability of the Cestas δ¹⁸O_p record for this time interval. Another example is Graz (Fig. 3K), where from mid-2011 to early 2012, the models were unable to match the repeatedly occurring high values (> − 5‰) in winter, which were higher than the typical level of the summer δ¹⁸O_p records at this site (Fig. 3K).

Although the intra-seasonal pattern was in general well reflected in the model data (see above), still, occasionally occurring cases of monthly anomalies/single peaks are seen in which the amplitude of the prediction(s) fall behind the measurements. These typically occur in cases where negative values in winter are involved, for example, in February 2010 in Rovaniemi (Fig. 3N), November 2013 in Tortosa (Fig. 3C), and November 2015 in Graz (Fig. 3K).

3.3 Overall performances

Since the empirical inspection of the δ¹⁸O_p time series did not reveal explicit differences between the models’ ability to reproduce the actual observations, the monthly absolute differences were considered together at each validation stations and plotted on box-and-whiskers plots for further analysis (Fig. 4A,B). This was done for the cases when only the GNIP data (Fig. 4A) and then all data were used in the training procedure (Fig. 4B); also see Sect. 2.2.

In case of the RF_sp, (i) the median value of the absolute differences was smallest in 6 out of 14 cases considering only the GNIP data for training (Fig. 4A), and 7 out of 14 times considering all data (Fig. 4B). Meanwhile, (ii) the interquartile ranges for this particular model were smallest 4 out of 14 times with only GNIP data (Fig. 4A), and 9 out of 14 times with all data (Fig. 4B), and (iii) the upper fence maximum value was the lowest 6 out of 14 times with only GNIP data (Fig. 4A) and 9 out of 14 times with all data (Fig. 4B).

Comparing this performance to the only alternative yet to be published in the specific field of precipitation isotope composition prediction (PisoAI, Nelson et al. 2021), (i) the median value of absolute differences was the smallest 8 out of 14 times with only GNIP data (Fig. 4A), and 6 out of 14 times when all the stations were considered (Fig. 4B). (ii) The interquartile ranges were the smallest 8 out of 14 times with only GNIP data (Fig. 4A), and 5 out of 14 times with all data (Fig. 4B). Lastly, (iii) the upper fence was the lowest 6 out of 14 times with only GNIP data (Fig. 4A, and 5 out of 14 times with all data (Fig. 4B). When not only the GNIP, but all the stations available for the present study (Fig. 1) were considered, RF_sp proved to provide more accurate predictions in 7 out of 14 cases (Fig. 4B). It has to be noted that in this case random forest models were trained on a larger set compared to PisoAI (Nelson et al. 2021), which was trained exclusively on GNIP data.

The worst median values were produced by the RF model when dealing with only GNIP (Fig. 4A), and the RF_tg model when all data were used (Fig. 4B). There is no clear indication that the Random Forest models created by merging all the monthly datasets (RF_tg) would produce better predictions than the versions calculated for every month. Among the validation stations, the most striking deviations were seen at Rovaniemi and A Coruña primarily in summer (Fig. 3A,N and Sect. 3.1). These stations are situated at a marginal position with a relative scarcity nearby training stations (Figs. 1A, 4C). Thus, without an abundance of similar training stations, the random forest algorithm struggles reproducing the empirical degree of the latitudinal effect (Dansgaard 1964).

In general, with the inclusion of additional sites in the training set compared to the GNIP data, the predictive accuracy of the random forest approaches increased in 12, 7 and 10 cases out of the 14 for the RF, RF_tg and RF_sp variants, respectively (Fig. 4A vs. 4B: MAE values). The biggest improvement was achieved at the validation stations located in the eastern part of the Alps (Kufstein, Wildalpen and Graz; Fig. 4C), where in fact the presence of the ANIP network (Fig. 1A: red dots) increased the total abundance to the greatest degree (Fig. 1B, C). The difference in predictive accuracy (MAD) ranged from 0.73 to 2.87‰; Fig. 4B vs. A.

No comparable improvement in MAD was observed for the validation stations in the western Alpine Region (Nyon, Basel, St. Gallen). A characteristic example is the Basel station with the addition of the ANIP data; still, the MAD increased to between 0.06 and 0.16 ‰ for all random forest variants (Fig. 4A, B). This region neighbors the ANIP network (Figs. 1 and 4C), thus some degree of improvement might have been expected. A similar situation is observed at the northernmost validation stations, where the most promising random forest variant gave poorer estimations (Rovaniemi and Espoo) when the model was trained on an extended dataset compared to the GNIP version (Figs. 4A, B).

To assess the overall performance—regardless of the spatial location(s)—of the tested ML models, the MAD, MAE and Lin’s concordance correlation coefficient values were calculated for the 14 validation stations were averaged (Table 1). In this respect, the PisoAI (MAD = 1.113‰) performed better by 0.2‰ than the most promising random forest variant (RF_sp—MAD = 1.299‰), when only the GNIP data were considered; with the complete database, the RF_sp method gave nearly as low MAD value (1.124‰). In case of the MAE values, RF_sp resulted the lowest value (1.345‰) when all the data were included for the analysis. The highest score for the Lin’s CCC was provided by the RF_sp variant when only the GNIP network data considered (Lin’s CCC RF_{sp GNIP only} = 0.986), and further improved when additional stations (all data) were involved to the training set (Lin’s CCC RF_{sp all data} = 0.987). Those results corroborates the previous findings, namely, that with an extended dataset the RF_sp approach provides more accurate results than the PisoAI, but when only the GNIP database is used the performance of the methods remains comparable (Figs. 3 and 4; Table 1).

Table 1 Various overall error metrics obtained in the validation of the predicted products with the different methods

Full size table

4 Conclusions and outlook

In the study the performance of four machine learning algorithms were compared on monthly δ¹⁸O_p records from 2008 to 2017. The methods were random forest (RF), random forest employing buffer distance (RF_sp), a random forest model trained using all data employing month and year as categorical variables (RF_tg), and lastly, the results of a previously published model employing XGBoost (Nelson et al. 2021). The accuracy of the predictions was tested on all the available data and a subset consisting of only GNIP data. Using only the GNIP data, similar results were obtained regardless of which random forest variant was employed, but these slightly underperformed compared to the XGBoost approach. However, the inclusion of additional network data in the training set further improved the results for all random forest models (Table 1). In this respect, the model employing buffer distance RF_sp gave the smallest overall error amongst the random forest variants used; indeed, it was even better than that provided by the XGBoost. The median value of the absolute differences, their interquartile ranges and upper fence maximum values were all lower for a higher number of validation stations in the case of the RF_sp model than in the case of the PisoAI approach (Fig. 4A, B). The three leading predictors were vapor pressure, saturation vapor pressure, and temperature in the complex random forest models (RF_tg and RF_sp), a finding similar to that obtained using the PisoAI model with the XGBoost algorithm (Nelson et al. 2021). It was an obvious, yet important, observation that subregions represented by only a few stations (especially in northern Europe) benefit from the inclusion of additional data in the training dataset.

It is suggested that the next step in the research should be to increase the dataset by an even greater extent and perform comprehensive tests using the best performing methods, complemented by the XGBoost algorithm (Nelson et al. 2021) and random forest combined with Lasso regression (Erdélyi et al. 2023) which allows extrapolation outside the data boundaries (Zhang et al. 2019).

Gaining an in-depth knowledge of how machine learning methods behave under different circumstances (spatiotemporal data density, various predictors etc.) in predicting precipitation stable isotopes on continental scales is a necessary step. It is our expectation that in the near future, classical geostatistical approaches will be overtaken by geomathematical ML tools in the derivation of the next generation of (precipitation) isoscapes.

References

Aggarwal, P.K., et al.: Proportions of convective and stratiform precipitation revealed in water isotope ratios. Nat. Geosci. 9, 624 (2016). https://doi.org/10.1038/ngeo2739
Article Google Scholar
Amundson, R., Chadwick, O., Kendall, C., Wang, Y., DeNiro, M.: Isotopic evidence for shifts in atmospheric circulation patterns during the late Quaternary in mid–North America. Geology 24, 23–26 (1996)
Article Google Scholar
AWS (2021) Terrain Tiles. Amazon. https://registry.opendata.aws/terrain-tiles. Accessed 10 Dec 2021
Bataille, C.P., Crowley, B.E., Wooller, M.J., Bowen, G.J.: Advances in global bioavailable strontium isoscapes. Palaeogeogr. Palaeoclimatol. Palaeoecol. 555, 109849 (2020). https://doi.org/10.1016/j.palaeo.2020.109849
Article Google Scholar
Bataille, C.P., Jaouen, K., Milano, S., Trost, M., Steinbrenner, S., Crubézy, É., Colleter, R.: Triple sulfur-oxygen-strontium isotopes probabilistic geographic assignment of archaeological remains using a novel sulfur isoscape of western Europe. PLoS ONE 16, e0250383 (2021). https://doi.org/10.1371/journal.pone.0250383
Article Google Scholar
Bataille, C.P., von Holstein, I.C.C., Laffoon, J.E., Willmes, M., Liu, X.-M., Davies, G.R.: A bioavailable strontium isoscape for Western Europe: A machine learning approach. PLoS ONE 13, e0197386 (2018). https://doi.org/10.1371/journal.pone.0197386
Article Google Scholar
Biau, G., Scornet, E.: A random forest guided tour. TEST 25, 197–227 (2016). https://doi.org/10.1007/s11749-016-0481-7
Article MathSciNet MATH Google Scholar
Birkel, C., et al.: Characterization of surface water isotope spatial patterns of Scotland. J. Geochem. Explor. 194, 71–80 (2018). https://doi.org/10.1016/j.gexplo.2018.07.011
Article Google Scholar
Bowen, G.J.: Isoscapes: spatial pattern in isotopic biogeochemistry. Annu. Rev. Earth Planet. Sci. 38, 161–187 (2010). https://doi.org/10.1146/annurev-earth-040809-152429
Article Google Scholar
Bowen, G.J., Good, S.P.: Incorporating water isoscapes in hydrological and water resource investigations WIREs. Water 2, 107–119 (2015). https://doi.org/10.1002/wat2.1069
Article Google Scholar
Bowen, G.J., Revenaugh, J.: Interpolating the isotopic composition of modern meteoric precipitation. Water Resour. Res. 39, 1299 (2003). https://doi.org/10.1029/2003WR002086
Article Google Scholar
Breiman, L.: Out-of-Bag Estimation. University of California, Department of Statistics (1996)
Google Scholar
Breiman, L.: Bagging predictors. Random for Mach Learn 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Breiman L (2017) Classification and Regression Trees. 1st edn. Routledge, New York https://doi.org/10.1201/9781315139470
Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geosci Model Dev 7, 1247–1250 (2014). https://doi.org/10.5194/gmd-7-1247-2014
Article Google Scholar
Clark, I.D., Fritz, P.: Environmental Isotopes in Hydrogeology. Taylor and Francis (1997)
Google Scholar
Cluett, A.A., Thomas, E.K.: Resolving combined influences of inflow and evaporation on western Greenland lake water isotopes to inform paleoclimate inferences. J. Paleolimnol. 63, 251–268 (2020). https://doi.org/10.1007/s10933-020-00114-4
Article Google Scholar
Coplen, T.B.: Reporting of stable hydrogen, carbon and oxygen isotopic abundances. Pure App Chem 66, 273–276 (1994)
Article Google Scholar
Coplen, T.B., Herczeg, A.L., Barnes, C.: Isotope engineering—using stable isotopes of the water molecule to solve practical problems. In: Cook, P.G., Herczeg, A.L. (eds.) Environmental Tracers in Subsurface Hydrology. Springer US, Boston, MA, pp. 79–110. (2000) https://doi.org/10.1007/978-1-4615-4557-6_3
Craig, H.: Isotopic Variations in Meteoric Waters. Science 133, 1702–1703 (1961). https://doi.org/10.1126/science.133.3465.1702
Article Google Scholar
Cutler, A., Cutler, D.R., Stevens, J.R.: Random Forests. In: Zhang, C., Ma, Y. (eds.) Ensemble Machine Learning: Methods and Applications. Springer US, Boston, MA, pp. 157–175 (2012) doi:https://doi.org/10.1007/978-1-4419-9326-7_5
Daniels, W.C., Russell, J.M., Giblin, A.E., Welker, J.M., Klein, E.S., Huang, Y.: Hydrogen isotope fractionation in leaf waxes in the Alaskan Arctic tundra. Geochim. Cosmochim. Acta 213, 216–236 (2017). https://doi.org/10.1016/j.gca.2017.06.028
Article Google Scholar
Dansgaard, W.: Stable isotopes in precipitation. Tellus 16, 436–468 (1964)
Article Google Scholar
Ehleringer, J.R., Bowen, G.J., Chesson, L.A., West, A.G., Podlesak, D.W., Cerling, T.E.: Hydrogen and oxygen isotope ratios in human hair are related to geography. Proc. Natl. Acad. Sci. 105, 2788–2793 (2008). https://doi.org/10.1073/pnas.0712228105
Article Google Scholar
Erdélyi, D., Hatvani, I.G., Jeon, H., Jones, M., Tyler, J., Kern, Z.: Predicting spatial distribution of stable isotopes in precipitation by classical geostatistical- and machine learning methods. J. Hydrol. 617(Part C), 129129 (2023). https://doi.org/10.1016/j.jhydrol.2023.129129
Article Google Scholar
Feng, X., Faiia, A.M., Posmentier, E.S.: Seasonality of isotopes in precipitation: a global perspective. J. Geophys. Res. Atmos. (2009). https://doi.org/10.1029/2008JD011279
Article Google Scholar
Fórizs, I.: Isotopes as natural tracers in the watercycle: examples from the carpathian basin. Studia UBB Physica 1, 69–77 (2003)
Google Scholar
Funck, J., Bataille, C., Rasic, J., Wooller, M.: A bio-available strontium isoscape for eastern Beringia: a tool for tracking landscape use of Pleistocene megafauna. J. Quat. Sci. 36, 76–90 (2021). https://doi.org/10.1002/jqs.3262
Article Google Scholar
Gibson, J.J., Edwards, T.W.D.: Regional water balance trends and evaporation-transpiration partitioning from a stable isotope survey of lakes in northern Canada. Glob. Biogeochem. Cycl. 16, 10-11–10-14 (2002). https://doi.org/10.1029/2001GB001839
Article Google Scholar
Gopal, S.: Artificial Neural Networks in Geospatial Analysis. In: Richardson, D., Castree, N., Goodchild, M.F., Kobayashi, A., Liu, W., Marston, R.A. (eds.) International Encyclopedia of Geography. John Wiley & Sons, Ltd., pp. 1–7. (2016) https://doi.org/10.1002/9781118786352.wbieg0322
Harris, I., Osborn, T.J., Jones, P., Lister, D.: Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Scientific Data 7, 109 (2020). https://doi.org/10.1038/s41597-020-0453-3
Article Google Scholar
Hatvani, I.G., Erdélyi, D., Vreča, P., Kern, Z.: Analysis of the spatial distribution of stable oxygen and hydrogen isotopes in precipitation across the Iberian Peninsula. Water 12, 481 (2020). https://doi.org/10.3390/w12020481
Article Google Scholar
Hatvani, I.G., Leuenberger, M., Kohán, B., Kern, Z.: Geostatistical analysis and isoscape of ice core derived water stable isotope records in an Antarctic macro region. Polar Sci. 13, 23–32 (2017). https://doi.org/10.1016/j.polar.2017.04.001
Article Google Scholar
Heaton, K., Kelly, S.D., Hoogewerff, J., Woolfe, M.: Verifying the geographical origin of beef: The application of multi-element isotope and trace element analysis. Food Chem. 107, 506–515 (2008). https://doi.org/10.1016/j.foodchem.2007.08.010
Article Google Scholar
Hengl, T., Nussbaum, M., Wright, M.N., Heuvelink, G.B., Gräler, B.: Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 6, e5518 (2018)
Article Google Scholar
Heydarizad, M., Gimeno, L., Sorí, R., Minaei, F., Mayvan, J.E.: The stable isotope characteristics of precipitation in the middle east highlighting the link between the Köppen climate classifications and the δ¹⁸O and δ²H values of precipitation. Water 13, 2397 (2021)
Article Google Scholar
Hobson, K.A.: Tracing origins and migration of wildlife using stable isotopes: a review. Oecologia 120, 314–326 (1999). https://doi.org/10.1007/s004420050865
Article Google Scholar
Hobson, K.A., Wassenaar, L.I.: Linking breeding and wintering grounds of neotropical migrant songbirds using stable hydrogen isotopic analysis of feathers. Oecologia 109, 142–148 (1996). https://doi.org/10.1007/s004420050068
Article Google Scholar
IAEA (2005) Isotopic composition of precipitation in the Mediterranean Basin in relation to air circulation patterns and climate. TECDOC Series, vol 1453. International Atomic Energy Agency, Vienna, Austria
IAEA (2019) Global Network of Isotopes in Precipitation. The GNIP Database. http://www.isohis.iaea.org. Accessed 12.11.2019
Ishwaran, H., Kogalur, U., Kogalur, M. RandomForestSRC: fast unified random forests for survival regression, and classification (RF-SRC) (2021) [(accessed on 16 July 2020)]
Kaseke, K.F., Wang, L., Wanke, H., Turewicz, V., Koeniger, P.: An analysis of precipitation isotope distributions across namibia using historical data. PLoS ONE 11, e0154598 (2016). https://doi.org/10.1371/journal.pone.0154598
Article Google Scholar
Kern, Z., Kohán, B., Leuenberger, M.: Precipitation isoscape of high reliefs: interpolation scheme designed and tested for monthly resolved precipitation oxygen isotope records of an Alpine domain. Atmos. Chem. Phys. 14, 1897–1907 (2014). https://doi.org/10.5194/acp-14-1897-2014
Article Google Scholar
Kottek, M.G., Grieser, J., Beck, C., Rudolf, B., Rubel, F.: World Map of the Köppen-Geiger climate classification updated. Meteorol. z. 15, 259–263 (2006). https://doi.org/10.1127/0941-2948/2006/0130
Article Google Scholar
Kovács, J., Tanos, P., Korponai, J., Székely, I.K., Gondár, K., Gondár-Sőregi, K., Hatvani, I.G.: Analysis of Water Quality Data for Scientists. In: Voudouris, K., Voutsa, D. (eds.) Water quality monitoring and assessment. InTech, pp 65–94 (2012) https://doi.org/10.5772/32173
Kralik, M., Papesch, W., Stichler, W.: Austrian Network of Isotopes in Precipitation (ANIP): Quality assurance and climatological phenomenon in one of the oldest and densest networks in the world. Isotope Hydrol. Integr. Water Resour. Manage. 146–149 (2003)
Lawrence, I.K.L.: A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989). https://doi.org/10.2307/2532051
Article MATH Google Scholar
Li, J., Heap, A.D., Potter, A., Daniell, J.J.: Application of machine learning methods to spatial interpolation of environmental variables. Environ. Model. Softw. 26, 1647–1659 (2011). https://doi.org/10.1016/j.envsoft.2011.07.004
Article Google Scholar
Liu, Z., Tian, L., Chai, X., Yao, T.: A model-based determination of spatial variation of precipitation δ¹⁸O over China. Chem. Geol. 249, 203–212 (2008). https://doi.org/10.1016/j.chemgeo.2007.12.011
Article Google Scholar
Murray, F.W.: On the computation of saturation vapor pressure. Rand Corp Santa Monica Calif (1966)
Nan, Y., Tian, F., Hu, H., Wang, L., Zhao, S.: Stable isotope composition of river waters across the world. Water 11, 1760 (2019)
Article Google Scholar
Nelson, D.B., Basler, D., Kahmen, A.: Precipitation isotope time series predictions from machine learning applied in Europe. Proc Natl Acad Sci 118, e2024107118 (2021). https://doi.org/10.1073/pnas.2024107118
Article Google Scholar
Oczkowski, A., Kreakie, B., McKinney, R.A., Prezioso, J.: Patterns in stable isotope values of nitrogen and carbon in particulate matter from the Northwest Atlantic continental shelf, from the gulf of maine to cape hatteras. Front Mar Sci (2016). https://doi.org/10.3389/fmars.2016.00252
Article Google Scholar
Prasad, A.M., Iverson, L.R., Liaw, A.: Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9, 181–199 (2006). https://doi.org/10.1007/s10021-005-0054-1
Article Google Scholar
Probst, P., Wright, M.N., Boulesteix, A.-L.: Hyperparameters and tuning strategies for random forest. Wires Data Min Knowl Discov 9, e1301 (2019). https://doi.org/10.1002/widm.1301
Article Google Scholar
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N.: Deep learning and process understanding for data-driven earth system science. Nature 566, 195–204 (2019). https://doi.org/10.1038/s41586-019-0912-1
Article Google Scholar
Rozanski, K., Araguás-Araguás, L., Gonfiantini, R.: Isotopic patterns in modern global precipitation. In: Swart, P.K., Lohmann, K.C., Mckenzie, J., Savin, S. (eds.) Climate Change in Continental Isotopic Records. American Geophysical Union, USA, pp. 1–36. (1993) https://doi.org/10.1029/GM078p0001
Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52, 591–611 (1965). https://doi.org/10.1093/biomet/52.3-4.591
Article MathSciNet MATH Google Scholar
Shi, Y., Wang, S., Zhang, M., Argiriou, A.A., Guo, R., Song, Y., Zhu, X.: Isoscape of δ¹⁸O in Precipitation of the Qinghai-Tibet Plateau: Assessment and Improvement. Water 12, 3392 (2020)
Article Google Scholar
Signorell, A. et al.: DescTools: Tools for Descriptive Statistics, vol 28 (2019)
Stahl, M.O., Gehring, J., Jameel, Y.: Isotopic variation in groundwater across the conterminous United States – Insight into hydrologic processes. Hydrol. Process. 34, 3506–3523 (2020). https://doi.org/10.1002/hyp.13832
Article Google Scholar
Szatmári, G., Pásztor, L.: Comparison of various uncertainty modelling approaches based on geostatistics and machine learning algorithms. Geoderma 337, 1329–1340 (2019). https://doi.org/10.1016/j.geoderma.2018.09.008
Article Google Scholar
Szomolányi, O., Clement, A.: Use of random forest for assessing the effect of water chemistry parameters on the biological status of surface waters. Int. J. Geomath. same topical collection IJGE-D-22-00053 (2023)
Terzer-Wassmuth, S., Wassenaar, L.I., Welker, J.M., Araguás-Araguás, L.J.: Improved high-resolution global and regionalized isoscapes of δ¹⁸O, δ²H and d-excess in precipitation. Hydrol. Process. 35, e14254 (2021). https://doi.org/10.1002/hyp.14254
Article Google Scholar
Terzer, S., Wassenaar, L.I., Araguás-Araguás, L.J., Aggarwal, P.K.: Global isoscapes for δ¹⁸O and δ²H in precipitation: improved prediction using regionalized climatic regression models. Hydrol. Earth Syst. Sci. 17, 4713–4728 (2013). https://doi.org/10.5194/hess-17-4713-2013
Article Google Scholar
Wassenaar, L., Terzer-Wassmuth, S., Douence, C.: Progress and challenges in dual- and triple-isotope (δ¹⁸O, δ²H, Δ¹⁷O) analyses of environmental waters: an international assessment of laboratory performance. Rapid Commun. Mass Spectrom. 35, e9193 (2021). https://doi.org/10.1002/rcm.9193
Article Google Scholar
Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Res. 30, 79–82 (2005)
Article Google Scholar
Wright MN, Ziegler A (2015) ranger: A fast implementation of random forests for high dimensional data in C++ and R. arXiv preprint arXiv:150804409
Yoshimura, K.: Stable water isotopes in climatology, meteorology, and hydrology: a review. J Meteorol Soc Jpn Ser II 93, 513–533 (2015). https://doi.org/10.2151/jmsj.2015-036
Article Google Scholar
Zhang H, Nettleton D, Zhu Z (2019) Regression-enhanced random forests. arXiv preprint arXiv:190410416
Zhang, J., Genty, D., Sirieix, C., Michel, S., Minster, B., Régnier, E.: Quantitative assessments of moisture sources and temperature governing rainfall δ18O from 20 years' monitoring records in SW-France: Importance for isotopic-based climate reconstructions. J. Hydrol. 591, (2020). https://doi.org/10.1016/j.jhydrol.2020.125327

Download references

Acknowledgements

On behalf of project 'Limitations of spectral analysis of sedimentary proxy records, tested on simulated time series with timescale error and variable temporal resolution' we thank for the usage of ELKH Cloud (https://science-cloud.hu/) which helped us achieve the results published in this paper. This research has been supported by the National Research, Development and Innovation Office (grant SNN143868). This is contribution no. 85 of the 2ka Palæoclimatology Research Group.

Funding

Open access funding provided by ELKH Research Centre for Astronomy and Earth Sciences.

Author information

Authors and Affiliations

Doctoral School of Environmental Sciences, Eötvös Loránd University, Pázmány Péter stny. 1, Budapest, 1117, Hungary
Dániel Erdélyi
Department of Geology, Eötvös Loránd University, Pázmány Péter stny. 1, Budapest, 1117, Hungary
Dániel Erdélyi
Research Centre for Astronomy and Earth Sciences, Institute for Geological and Geochemical Research, Budaörsi út 45, Budapest, 1112, Hungary
Zoltán Kern & István Gábor Hatvani
CSFK, MTA Centre of Excellence, Konkoly Thege Miklós út 15-17, Budapest, 1121, Hungary
Zoltán Kern & István Gábor Hatvani
Department of Statistics, Corvinus University of Budapest, Fővám tér 8., Budapest, 1093, Hungary
Tamás Nyitrai

Authors

Dániel Erdélyi
View author publications
You can also search for this author in PubMed Google Scholar
Zoltán Kern
View author publications
You can also search for this author in PubMed Google Scholar
Tamás Nyitrai
View author publications
You can also search for this author in PubMed Google Scholar
István Gábor Hatvani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ZK and DE conceived and designed the study with contribution from IGH. DE performed the analysis and produced the figures. IGH, ZK and DE wrote the paper, with contributions from TN. The FLAE approach was applied to the sequence of authors; see https://doi.org/10.1371/journal.pbio.0050018. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Zoltán Kern.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 121 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Erdélyi, D., Kern, Z., Nyitrai, T. et al. Predicting the spatial distribution of stable isotopes in precipitation using a machine learning approach: a comparative assessment of random forest variants. Int J Geomath 14, 14 (2023). https://doi.org/10.1007/s13137-023-00224-x

Download citation

Received: 28 November 2022
Accepted: 15 May 2023
Published: 12 June 2023
DOI: https://doi.org/10.1007/s13137-023-00224-x

Predicting the spatial distribution of stable isotopes in precipitation using a machine learning approach: a comparative assessment of random forest variants

Abstract

Similar content being viewed by others

On the potential of mapping sea level anomalies from satellite altimetry with Random Forest Regression

A high-resolution daily gridded meteorological dataset for Serbia made by Random Forest Spatial Interpolation

Bias correction of global ensemble precipitation forecasts by Random Forest method

1 Introduction