Introduction

Small isolated wetlands are important landscape features that provide myriad ecosystem services (Cohen et al. 2016). They are known hotspots for nutrient and chemical transformation (Reddy and DeLaune 2008), sequester disproportionately large quantities of carbon relative to their area (Mitsch and Gosselink 2015; Marton et al. 2015; Craft et al. 2018), and harbor an abundance of threatened and endangered species (Williams and Dodd Jr 1978; Murdock 1995; Smith et al. 2019). However, the key attribute influencing a wetland’s function, water permanence or inundation duration (i.e., hydroperiod), is rarely known or quantified. In this study, we define hydroperiod as the duration of consecutive days with ponded surface water. In regard to amphibians, hydroperiod has been shown to be a far stronger determinant of taxa richness than wetland size (Snodgrass et al. 2000; Babbitt et al. 2003; Babbitt 2005), and many amphibian species will only breed in temporary wetlands (Dodd 1992, 1993). Thus, knowledge of hydroperiod is key to protecting critical habitat and preserving biodiversity of wetland obligate species. Furthermore, the lack of knowledge on water permanence or connections to permanent waterbodies has led to shifting legal protections for various surface water features including geographically isolated wetlands (e.g., McLaughlin et al. 2014; Cohen et al. 2016; Rains et al. 2016). Even with these uncertainties we are unaware of any widespread efforts to better monitor and assess water permanence in small wetlands beyond localized scales (e.g., NCDWR, 2018). This lack of hydrologic information can lead to land and species management decisions that might ignore a critical piece of information. Hence, tools are needed to help predict water permanence (hydroperiod) in small, isolated wetlands where data often are lacking, and the small size of which precludes the use of most currently available (and free of cost) remotely sensed data.

Past efforts to predict wetland hydrology have been based on three broad categories of techniques: physically based process models, remote sensing approaches, and empirical models. Physically based process models have been used to simulate the complex hydrology of pine flatwoods pre- and post-forest harvest (Sun et al. 1998, 2007) and to more broadly examine the role of forest harvest and management on wetland hydrology in low relief landscapes (Skaggs et al. 2005). Other studies have used these models to investigate potential impacts on wetland hydrology from climate change (Kurki-Fox et al. 2019) and to model hydrologic processes in various types of depressional wetlands (Krasnostein and Oldham 2004; Caldwell et al. 2007; Qi et al. 2019; Cartwright and Wolfe 2021). This type of model can be especially useful when information regarding hydrologic processes is needed in addition to simulations of particular hydrologic outputs. However, one of the challenges with many physically based process models is the relatively high and diverse data requirements, such as detailed soil characteristics and water table elevations over the model domain, which are rarely available.

Remotely sensed data have been used to infer the presence or absence of surface water in wetlands and time series of these remotely sensed data can then be used to estimate the duration of inundation (e.g., Jones 2019; Wu et al. 2019; Hopkinson et al. 2020; Alonso et al. 2020; Kissel et al. 2020; Londe et al. 2022). One product designed specifically for this purpose is the Dynamic Surface Water Extent (DSWE, Jones 2015; Jones 2019). This dataset is derived from Landsat and therefore has repeat scenes of 8–16 days (assuming cloud cover is minimal and multiple sensors are used). Given the 30 m resolution of Landsat, this dataset is most applicable to larger wetland sites (i.e., those that are at least the area of several Landsat pixels). Other studies have coupled multiple sources of remote sensing data to backfill temporal gaps often present with a single satellite (Murray-Hudson et al. 2015; Wu et al. 2019). For example, in the prairie pothole region of the United States, Wu et al. (2019) combined aerial imagery and lidar data to explore wetland inundation dynamics. By combing these datasets, they were able to identify smaller waterbodies than were evident from satellite derived data alone and that provided a denser temporal record, which allowed more highly resolved estimates of inundated extent. While these examples of remote sensing approaches are highly viable in large, non-forested or sparsely forested wetland systems, they are often not practical for small, forested wetlands. This is especially true in humid regions that have frequent cloud cover, often limiting the consecutive satellite pass-overs that are suitable for analysis.

Empirical studies have used field data with statistical models to relate weather, climate, and characteristics of a wetland basin to the duration, depth, or area, of wetland inundation. Studies using these techniques have often focused more explicitly on hydroperiod determination than studies using physically based process models or remotely sensed data (e.g. Greenberg et al. 2015; Chandler et al. 2016; Kissel et al. 2020; Londe et al. 2022). For example, in a lowland coastal plain site Riley et al. (2017) showed that a neighborhood scale topographic metric was strongly related to wetland hydroperiod class (short, long, permanent), and in a nearby study site, Chandler et al. (2016) used generalized linear mixed models to predict water presence as a function of climatological and meteorological variables and wetland characteristics. Other studies have found climate indices (e.g., Palmer drought severity index, standardized precipitation index [SPI]) (Davis et al. 2019; Londe et al. 2022) as well as soil moisture (Kissel et al. 2020) to be important for prediction of wetland inundation. While statistical models have proven very useful for localized applications, predictors derived from field data often reflect site specific variables and may limit their transferability to different wetland sites. Collectively, however, the above studies suggest that empirical models should consider weather forcings and characteristics of the wetland basins for accurate inundation prediction.

The application of machine learning (ML) algorithms (another empirical modeling approach) to the broader domain of hydrologic prediction has greatly increased in the last decade (see Shen et al. 2021; Nearing et al. 2021; Zounemat-Kermani et al. 2021). Many studies that have used ML methods for hydrologic prediction have highlighted higher accuracy than physically based process and statistical models (e.g. Galeati 1990; Solomatine and Ostfeld 2008; Best et al. 2015; Nearing et al. 2016). Machine learning models, and specifically random forests, have many desirable properties for prediction problems: they are non-parametric (i.e., require no assumption regarding data distributions); they are not negatively affected by correlated predictors; and they can capture interactions between predictors, among others (see Tyralis et al. 2019). Despite the large body of literature that highlights accurate hydrologic predictions, we are only aware of a limited number of studies that have employed ML techniques for predicting wetland inundation dynamics (Shaeri Karimi et al. 2019; Choi et al. 2020; Cartwright et al. 2021; Solvik et al. 2021). Of these, only Cartwright et al. (2021) focused specifically on small depressional wetlands, and only Choi et al. (2020) attempted prediction at a daily time-step. The wetland community could benefit from additional studies using ML approaches to understand where they may be viable options for predicting inundation dynamics of small depressional wetlands.

The goal of this study was to explore a novel application of ML to predict daily inundation and estimate hydroperiods in small depressional wetlands. Specifically, we had three objectives: (1) build random forest models using widely available meteorological and elevation data to derive predictor variables to predict daily inundation and to estimate median hydroperiods; (2) assess model performance based on the proportion of days correctly classified and the accuracy the of median hydroperiods estimated from daily predictions; and (3) evaluate variable importance metrics to determine which predictors were most influential in classification accuracy. This will provide a framework for others to build and evaluate similar models to predict daily inundation where extensive field datasets for predictor variables may not exist.

Methods

Study Area and Hydrologic Data

This study was conducted using data from Saint Marks National Wildlife Refuge (SMNWR) in the Florida panhandle along the Gulf Coast (Fig. 1). Water levels were monitored in 59 wetlands that covered a variety of wetland types (cypress domes, flatwoods, wet prairies, hardwood wetlands, and sink-hole ponds), sizes (0.01 ha to 1.38 ha), and elevations (0.94 to 4.31 m, relative to North American Vertical Datum of 1988, Supplemental material table S1). Although some wetlands were at very low elevations, coastal flooding is rare in the study area and has mostly been associated with land falling tropical systems (Gunzburger et al. 2010). Specific monitoring periods in each wetland varied, but most had at least five years of record with a maximum of ten years. Monitoring equipment at each wetland consisted of a total pressure sensor (Hobo U-20, Onset Corporation, Cape Cod, MA) and staff gage installed in the deepest portion of the wetland’s basin, or in permanent wetlands, as deep as possible and in proximity to vegetation that indicated permanent inundation. To compensate for atmospheric pressure, a barometer was installed in a dry well below the water table. This was to prevent inaccuracies from large temperature swings that may occur when barometers are exposed to ambient air conditions (McLaughlin and Cohen 2011). All water level data were converted to a binary classification, where a mean daily water level ≤ 0.02 m was classified as “dry”; otherwise the wetland was considered inundated and classified as “wet.” This cutoff represents the average error observed when comparing water level from the pressure sensor to the staff gage readings. Furthermore, at a level of 0.02 m there will be little in the way of wetted habitat for aquatic and semi-aquatic organisms, in most cases rendering them “functionally dry.”

Fig. 1
figure 1

Location of study area and monitored wetlands

Datasets for Predictor Development

Daily precipitation depth, daily minimum and maximum air temperature, mean daily vapor pressure deficit (VPD), and daily reference evapotranspiration (PET) data were obtained from gridMET (Abatzoglou 2013), using the climateR R-package (Johnson 2021). Initial testing indicated that PET was a more important predictor than either of its components (air temperature or VPD), so these variables were removed from further consideration. The gridMET dataset provides daily meteorological data on an approximately 4-km2 grid across the contiguous United States. Given the relatively small study area extent, the grid cell at the centroid of the study area was used as the point to extract meteorological data.

Most of the monitored wetlands have been field mapped (Riley 2016) and those boundaries were used where available. Where boundaries were not field mapped, the national wetlands inventory (NWI) boundaries were used, and, where not present in the NWI, boundaries were heads-up digitized from the national aerial imagery program (NAIP) and a lidar derived digital elevation model (DEM). The dominant vegetation class for each wetland was extracted from the LANDFIRE (2016) dataset. The LANDFIRE dataset has a 30-m resolution and is not intended for high resolution vegetation mapping but can represent the vegetation class surrounding the wetlands. A lidar-derived 1-m DEM (FDEM 2008) was used to derive topographic metrics, including a topographic position index (TPI) (Weiss 2001; Jenness et al. 2011; Riley et al. 2017). In our study area an annulus shaped neighborhood with inner and outer radius of 20 and 40 cells, respectively, was used. All topographic metrics (mean TPI, mean elevation, and elevation range) were summarized within each wetland boundary.

Conceptualizing Wetland Hydrology for a Data Driven Model

The water budgets of the depressional wetlands in this study are primarily controlled by precipitation as the input and evapotranspiration (ET) as the output. Groundwater exchange certainly occurs but we lack data to adequately quantify that exchange. In similar wetlands, McLaughlin and Cohen (2012) found groundwater exchange frequently reversed directions between the wetlands and the surficial aquifer. Thus, the water budget is represented simply as ΔStorage = precipitation – potential evapotranspiration.

The differences are summed over various antecedent timesteps to represent different aspects of storage. Short lags (5–60 days) likely reflect storage conditions in the vadose zone and longer lags (60–180 days) may represent saturated zone storage conditions. Basin morphology (Fig. 2) along with antecedent storage conditions of soils, the surficial aquifer, and the wetland basin determines how a wetland will respond to precipitation and evapotranspiration. Wetlands with small surface areas but deep basins will collect relatively less precipitation but will also be less susceptible to evaporative losses as compared to broad shallow wetlands. To characterize wetland morphology, we used basin area for the 2-dimensional surface area and proxy variables, such as the TPI and elevation range, to represent the z-axis (depth) of wetlands. It is important to note, however, that TPI does not precisely represent the depth of the actual wetland but rather reflects the mean elevation of the wetland relative to the surrounding landscape. For example, a more negative TPI indicates a wetland is more deeply recessed in relation to the surrounding landscape compared to a wetland with less negative TPI. The TPI variable may also reflect the relative likelihood of a wetland intersecting the surficial aquifer and provides potential context for groundwater influence. Dominant vegetation type was also included which could be important for modulating the hydrologic response via interception and transpiration amongst the different types of wetlands (e.g., emergent vs. forested wetlands).

Fig. 2
figure 2

Hypothetical configurations of wetlands size (top – plan view) and basin morphology (bottom – cross sectional view) that when combined alter water storage and duration for a given depth of rainfall. All combinations of the plan and cross section forms are present in the current study

Model Description and Set-up

We used a random forest algorithm (Breiman 2001) implemented using R statistical software (R-Core Team 2019) and the randomForest package (Liaw and Wiener 2002) to predict daily inundation status (wet or dry) for individual wetlands. The random forest algorithm is an ensemble learning method that uses a collection of randomized decisions trees (James et al. 2013). Specifically, the random forest algorithm creates many decision trees from bootstrapped samples of training data. Each tree is grown from a random subset of predictor variables which helps to decorrelate the individual trees. For binary classification problems, the predicted classification results from the most frequent classification (i.e., majority votes) across all trees. Default values for random forest implementations often perform well even without hyperparameter tuning (Kuhn and Johnson 2013). However, there are several hyperparameters that can be tuned, often resulting in improved accuracy.

We used a grid search approach for the hyperparameters mtry (number of predictor variables available for splitting within a tree) and nodesize (minimum size of terminal nodes) and searched values ranging from 2 to 11 and 1–10, respectively, to find the best combination of values. Many combinations resulted in highly accurate predictions and very low out of bag error rates (OOB). Hyperparameter values were selected based on low OOB error rates that would also allow for greater randomization (i.e., smaller mtry) to help ensure decorrelated trees (James et al. 2021). For subsequent analyses of model performance and predictions the hyperparameters were set to mtry = 5 and nodesize = 2. We also set the number of trees to grow (ntree) to 300.

Another consideration in classification problems is the balance of observations within classes. In this case, the data were highly imbalanced (~ 5:1 – wet:dry). When this occurs, the models may simply target the majority class, because predicting those values alone will result in high accuracy and relatively low misclassification (Chen and Breiman 2004). We used variables within the randomForest implementation to achieve balance; namely, strata can be used to define the variable that sampling should be split over (in this case, wet and dry observations) and sampsize can be used to specify the same number of samples from each class. The minimum number of observations in a class (dry class) were used for setting up the balanced sampling. This resulted in downsampling the majority class, so that each class contributed an equal number of observations to the training data. We also used a cutoff value of 0.30 to assign a day as ‘dry,’ which indicates if 30% of all votes for a given site and day are “dry,” then that day should be classified as dry. This threshold was determined by stepping down the cutoff value from 0.50 (the default) to 0.25 by 0.05 and selecting the value that resulted in the greatest balanced accuracy of the validation set.

While random forests are robust to multicollinearity, the addition of correlated variables can make any attempt at drawing inference using variable importance metrics more difficult (Toloşi and Lengauer 2011). We used the usdm R-package (Naimi et al. 2014) to examine the variance inflation factors (VIF) of all paired predictor variables and used a VIF value of 5 as a cutoff for variable removal. This procedure resulted in the removal of 11 of the original 22 quantitative predictor variables (see Table 1). All variables removed due to high correlation were different lags of precipitation and potential evapotranspiration, and elevation standard deviation. This left the final models to consider the remaining topographic metrics, dominant vegetation class, ordinal day, and the water budget deficit over different antecedent periods for predicting inundation (Table 1).

Table 1 Candidate predictor variables considered for use in the random forest and the data source. Variable names in bold were used in the final models

Model Evaluation

It has been shown that error rates estimated from training data are often overly optimistic (James et al. 2021), which suggests relying on OOB error rates alone could lead to models with poor performance on unseen data. Therefore, we undertook a rigorous multi-faceted approach to model evaluation that tested performance on unseen wetlands and on unseen observations from wetlands in the training set. First, we split the data into training and validation groups using a stratified Monte Carlo cross validation approach (Kuhn and Johnson 2013), where all data from six randomly selected wetlands (2 each from permanent, frequently inundated, and infrequently inundated wetlands) were set aside as the “validation set.” The classification of “frequently” and “infrequently” inundated wetlands were based on the relative frequency of days that were observed to be inundated. The former class was greater than the median across all wetlands (79% of days) but non-permanent, and the latter class was wetlands whose inundation frequency was less than the median. This allowed an assessment of the effect of inundation frequency on model performance. The remaining data were randomly split into a 70/30 train/test sets (see Table S2 for evaluation of different data splitting strategies). Accuracy metrics on the test set demonstrate the predictability of the model on unseen observations from wetlands that had been used in training the model (similar to OOB error), whereas accuracy metrics on the validation set quantify the accuracy of the model predictions on completely unseen wetlands. This evaluation approach was conducted on 100 iterations to evaluate the stability of model predictions when different combinations of data were used to train, test, and validate the model predictions. To test for statistical differences in the observed and predicted median hydroperiods among the inundation classes we used the Brown-Mood median test in the coin R-package (Hothorn et al. 2006).

Results

Model Performance

From the 100-iteration cross validation routine, OOB error was 0.05–0.06, and the median balanced accuracy was 96% and 83% for the test and validation data, respectively (Fig. 3). Not surprisingly, performance was better on the test set because other data from these wetlands had been “seen” by the models. The relatively good and stable performance on the unseen wetlands indicated that the random forest classifier was a useful tool to explore the predictability of wetland inundation and hydroperiod.

Fig. 3
figure 3

Boxplots of balanced accuracy across the 100 iterations to assess stability of model performance. Test data refers to unseen (held out) observations from wetlands used in the training dataset and validation refers to observations from wetlands not included in the training dataset. Boxes represent the interquartile range, the horizontal line within the box is the median, vertical lines extend to ± 1.5-times the interquartile range, and dots are outlying data points

Variable Importance and Interpretation

Variable importance was scaled to determine the proportional contribution of each predictor variable to overall classification accuracy (Fig. 4). Across most of the of model evaluations, topographic metrics were consistently in the top four most important variables regardless of which importance metric was used. Water budget deficit at a 60-day lag (def60) was also consistently among the most important predictor variables. The other variables also contributed to increasing classification accuracy but not as strongly, or as consistently, as the three topographic metrics and def60.

Fig. 4
figure 4

Variable importance according to the Gini impurity (left) and mean decrease in accuracy (right) for a single model iteration. Note, importance was scaled to highlight the relative contribution of each predictor. See Table 1 for variable description

Considering the patterns of the top four variables, while holding all other variables constant, we observed highly non-linear relationships with inundation status (Fig. 5). A relatively linear relationship was present when def60 was below zero but rapidly leveled off above 100 mm. The relationships were more complex with the topographic predictors showing stepped, non-linear, and non-monotonic patterns. For example, the stepped pattern of basin averaged TPI (tpi_av_m) may indicate the presence of threshold values, where a decrease in the proportion of wet days occurred when the tpi_av_m is > -1 m and a further sharp decline as tpi_av_m approached − 0.5 m (Fig. 5).

Fig. 5
figure 5

Partial dependence plots from a single model iteration for the top four most important variables based on mean decrease in accuracy. All variables (x-axes) are in meters except def60 which is in millimeters and the y-axis indicates the proportion of votes for classifying a day as wet

Prediction of Wetland Inundation

The first check of model performance was to compare the number of wet and dry days between observations and predictions. We selected two infrequently inundated wetlands, one frequently inundated wetland, and one permanent wetland to highlight the dynamics reproduced by the model (Fig. 6). The top two plots of Fig. 6 represent infrequently inundated wetlands where predictions were relatively accurate. Wetland hawk demonstrated very accurate predictions of wet and dry periods capturing the general patterns and short-term dynamics (Fig. 6). The model also performed reasonably well on wetland rnd117; however, the short-term dynamics were not as well reproduced as those in hawk. At wetlands wpt79 (frequently inundated) and wpt150 (permanent) the general patterns were well reproduced by the models.

Fig. 6
figure 6

Daily observed and predicted inundation for select wetlands January 2014 – October 2015. The “cloud” of points results from jittering the data; this was done so that all observations are visible and to avoid obscuring the observed data. Only a single model run is displayed for each wetland

Considering model performance at the inundation class level, it is apparent that the median classification accuracy of infrequently inundated wetlands (0.92) was less similar to the observations than the frequently inundated wetlands or permanent wetlands (median = 0.98 and 0.99, respectively, Fig. 7). There was a greater occurrence of overprediction for the infrequent class compared to the frequent class, but all had a similar occurrence of under prediction (Fig. 7).

Fig. 7
figure 7

Ratio of predicted inundated days to observed inundated days by inundation category. Note, a value of 1 means perfect agreement; below 1 indicates that the models under-predicted and over 1 indicates that the models over-predicted. Boxes represent the interquartile range, the horizontal line within the box is the median, vertical lines extend to ± 1.5-times the interquartile range, and dots are outlying data points

An examination of individual wetlands revealed additional insight regarding predictability and the stability of predictions. For example, wetland wpt57 (‘infrequent’ inundation category) showed an exceptional case where wetlands included in the training data were critical to accurate predictions (a value of 1 indicates perfect prediction, Fig. 8). At this wetland, five of the eight models from the cross-validation routine predicted the number of inundated days with 81–99% agreement; however, the other two models over predicted the number of days by 62% and 98%. Similar inaccuracies occurred at other wetlands (e.g., fn and rnd52); however, most of the extreme inaccuracies (those > 30%) were due to one or two outlying data points as most models were relatively precise. Wetlands in the other two inundation classes showed less variation in accuracy in response to changes in the training data, although there were exceptions (Fig. 8).

Fig. 8
figure 8

Ratio of predicted inundated days to observed inundated days for individual wetlands grouped by inundation frequency category. Boxplots represent the distribution of the ratio from all cross-validation iterations where a wetland was selected (total times selected is indicated by the number below each boxplot). Not all study wetlands were selected an equal number of times in the random sampling; however, an equal number (200) from each inundation class was ensured. The dotted line at y = 1 indicates perfect agreement between observed and predicted. Boxes represent the interquartile range, the horizontal line within the box is the median, vertical lines extend to ± 1.5-times the interquartile range, and dots are outlying data points

Estimating Hydroperiod

Because the ultimate goal of this study was predicting wetland hydroperiod – the duration of continuous inundation – we primarily focused our analysis on the non-permanent wetlands, and in these wetlands, we excluded periods of continuous inundation less than 30 days. This approach considered minimal periods that are meaningful for semi-aquatic biota, such as some rapidly developing anuran species (e.g., Gastrophryne carolinensis, Daszak et al. 2005) and reduced skewness of summary statistics associated with inclusion of a single to a few days of a particular inundation status (wet or dry).

Overall, prediction of hydroperiod was most accurate among the infrequent class, followed by the frequent class, with permanent wetlands having the greatest discrepancies (Fig. 9). For the infrequently inundated class, the test statistic for the Brown-Mood median test indicated a significant difference between the observed and predicted median hydroperiod (z = 3.325, p = < 0.0001, α = 0.05); however, the predicted and observed median hydroperiods only differed by 14 days. The frequently inundated wetlands also demonstrated a significant difference based on the median test (z = 7.2117, p = < 0.0001, α = 0.05), and the median predicted hydroperiod was 144 days less than the observed. While the permanent wetlands were not a primary focus, it is worth noting that they had the greatest discrepancy of a wetland class (Fig. 9).

Fig. 9
figure 9

Distribution of observed (obs) and predicted (pred) hydroperiods split by the inundation category. Boxes represent the interquartile range, the horizontal line within the box is the median, vertical lines extend to ± 1.5-times the interquartile range, and outliers (observations beyond 1.5-times the interquartile range) were removed to improve visibility of boxes

The distribution of hydroperiods for individual wetlands provided additional insight regarding where the model performed best. Figure 10 displays the distribution of the estimated hydroperiods for individual wetlands across all the cross-validation iterations compared to the observed distribution of hydroperiods. Figure 10 demonstrates that the predictability of hydroperiod was inconsistent across wetlands even within a given inundation class. For example, the infrequently inundated category wetlands fn and wbf demonstrated good agreement in the median hydroperiods, only differing by 6 and 14 days, respectively. In contrast, the models generally overpredicted hydroperiods in wetland rnd75, with the predicted median hydroperiod 68 days longer than observed. Wetlands from the frequently inundated category showed much greater variability and discrepancy between observed and predicted hydroperiods (Fig. 10). Wetlands jr36 and borrow405 were predicted the most accurately of those in Fig. 10 (right panel), but they still differed considerably from the observed median hydroperiod by 61 and 101 days, respectively. The remaining wetlands in the frequently inundated category had much greater discrepancy in predicted and observed hydroperiods. In many cases the models tended to produce rather narrow ranges of hydroperiods that underestimated the high-end of the distribution of many wetlands (Fig. 10). The models did not show a particular bias in either of the inundation classes; rather, results across individual wetlands showed both under and over prediction.

Fig. 10
figure 10

Distribution of observed (obs) and predicted (pred) hydroperiods for 11 randomly selected wetlands from each inundation frequency class. Boxes represent the interquartile range, the horizontal line within box is the median, vertical lines extend to ± 1.5-times the interquartile range, and dots are outlying data points

Discussion

Drivers of Inundation and Hydroperiod

Controls on wetland hydroperiod will vary depending on topographic position within the landscape and hydrogeologic setting. However, the overarching drivers are related to the water budget of a site and the ability of a basin to hold water (i.e., its volume, shape, and geological material). As Fig. 4 demonstrates, both water budget and basin morphology were important for accurate prediction of wetland inundation. While some studies highlight that mean decrease in accuracy is a more robust (less biased) variable importance metric (Strobl et al. 2007; Genuer et al. 2010), Boulesteix et al. (2012) indicates that mean decrease in gini impurity is more appropriate for classification with imbalanced classes. However, in this study both metrics generally agreed and ranked the same four variables as the most important, albeit in a slightly different order.

Many studies spanning a wide range of wetland types and climates have also observed basin morphology (topography or depth) to be a key variable related to hydroperiod (Brooks and Hayashi 2002; Pyke 2004; Garmendia and Pedrola-Monfort 2010; Greenberg et al. 2015; Chandler et al. 2016). Thus, this study’s modeling approach could be useful in other areas where topography exerts a strong control on water permanence, such as in northeastern United States vernal pools or in the prairie pothole region of North America. Many studies attempting to predict wetland inundation dynamics have found antecedent precipitation to be important for accurate prediction (e.g. Bartuszevige et al. 2012; Chandler et al. 2016; Solvik et al. 2021; Londe et al. 2022). In this study, however, antecedent water budget deficits (precipitation minus potential evapotranspiration) were more important predictors of inundation than precipitation. The transferability of the water budget deficit at a 60-day lag to other regions is less straightforward and the importance of this variable likely varies depending on soil texture, water table depth, and the hydroclimatology of a region. We believe this intermediate antecedent period (60 days) may integrate aspects of storage from the vadose and saturated zones in this study region making it consistently more important than shorter- or longer-term lags that may be more reflective of one or the other storage components. In regions with greater topographic relief, or with clayey soil horizons that inhibit rapid infiltration, it is probable that shorter antecedent periods may be of more importance than were observed in this study because lateral flows following rainfall could be primary drivers of inundation (e.g., Winter 1988; Winter 1999; Bartuszevige et al. 2012).

Model Performance

Compared to other studies that have predicted inundation and hydroperiod dynamics, our median balanced accuracy across models of 83% was higher than that reported by several other studies (80% [Londe et al. 2022], 76% [Greenberg et al. 2015], and 73% [Chandler et al. 2016]). However, it is difficult to directly compare model accuracy between different studies because each study’s models were intended for different applications with different temporal resolutions. For example, in this study we were making predictions at a daily time step; thus, misclassifying a single day effectively divides one hydroperiod into two, resulting in “poor” model performance. In contrast, many other studies have used infrequent observations of wetlands, either flooded extent or some metric of wetland water depth, to infer hydroperiods and often have focused on specific periods relevant to target taxa or species – most frequently amphibians (e.g., Babbitt 2005; Chandler et al. 2016; Cartwright et al. 2021; Solvik et al. 2021). While infrequent observation is certainly reasonable given the cost and difficulty of continuous (or near continuous) water presence observation, it does introduce the potential for missing a drying event or even very short-term filling and subsequent drying. As Chandler et al. (2016) acknowledged, only a single drying event (day) is necessary to induce mortality of larval salamanders. Furthermore, focusing on a single short period (i.e., breeding season) may obscure other important aspects of hydroperiod that relate to development of predators, prey, and habitats associated with the wetted duration prior to the focal period (Pechmann et al. 1989). Much of the literature rightly focuses on the wetted hydroperiod; however, some amphibian species (e.g., the threatened Ambystoma cingulatum) only lay their eggs in dry pond basins (Anderson and Williamson 1976). Therefore, knowing the drying regime may be just as important for some wetland species. This is especially true in the face of climate change and high uncertainty in future precipitation regimes of the southeastern United States (Carter et al. 2018; LaFontaine et al. 2019).

To capture the full suite of wetting and drying periods, we set out to predict hydroperiod continuously at a daily timestep – a process that could be useful for multiple taxa with various phenological and hydrologic requirements. Our results indicated that the models performed quite well, although there were wetlands that were less well predicted (Figs. 8 and 10). The finding of statistically significant differences between the predicted and observed hydroperiods (Fig. 9) in wetlands of all inundation classes doesn’t necessarily translate to biological significance. Whether such deviations are biologically meaningful will depend on individual species and their hydrologic requirements.

The large discrepancies we found between observed and predicted hydroperiods in the permanent wetlands was not overly surprising. As an example, if a permanent wetland had only a single day that was misclassified by the model in the middle of the record, then the hydroperiod estimate would be half of the observed inundated period. If this were to occur twice, the discrepancy would increase accordingly. Thus, viewing the results at this high level provides insights into where the models performed best, but did not tell the full story. This is in stark contrast to the analysis of inundation frequency, where the permanent inundation class was 99% accurate, highlighting the importance in choosing the most appropriate metric(s) and examining details for interpretation of model performance and any management implications. In cases where inundation frequency was accurate and hydroperiod was not, it could be assumed that a small number of misclassified days led to the discrepancy, and model output still contains valuable information. However, in these cases more human intervention is needed to interpret the results and define what is meaningful or at an acceptable level of accuracy. Another factor we did not explicitly consider in this study was the timing of filling and drying events. This is of critical importance because for a hydroperiod to be biologically meaningful, it must align with the phenology of a specific species. Based on the subset of wetlands presented in Fig. 4, it appears the models matched the timing of filling and drying accurately. Follow up analyses to quantify model performance related to the timing of filling and drying would best be conducted utilizing species-specific examples with biological survey data. Doing so would then allow for predicted inundation characteristics to be extended to evaluate and identify habitats of concern for the long-term conservation of populations.

Comparisons to Other Modeling Techniques

Unlike physically based process models, ML models are geared towards prediction rather than process understanding, although some insight into controlling variables can be gleaned when model parsimony is rigorously constrained (Fig. 4). If intensive site level data and the expertise to implement a physically based process model are available, then predictions and process-level insight could potentially be provided. However, if the main goal is prediction, ML approaches offer a robust alternative and may require less intensive field data for suitable predictions.

Machine learning models can be trained on extensive datasets (where they exists) and applied in other situations or locations where the phenomenon of interest operates on similar processes, such as where topography is thought to exert a strong control on wetland hydroperiod. Thus, practitioners could apply this modeling framework by deriving the input variables highlighted in Table 1 to explore the accuracy in other study areas. Additionally, as more field data become available it could be combined with the data used here to retrain models so patterns from other regions can also be learned and incorporated, making predictions more generalizable. If other predictor variables are identified that are critical to a given region (e.g., soil or bedrock type), the models could be updated to include the additional information. This modeling approach and associated predictors performed well in our study area; however, that does not mean it will work everywhere. Therefore, when applying such a framework it is critical to have some way of validating that the predictions are reasonable, whether through limited monitoring, remote sensing, or by correlations with antecedent weather conditions.

A final point worth reiterating is that the present approach is simply predicting whether water was present or not based on the threshold water-level. We recognize this is a rather crude representation of wetland hydrology and it neglects the spatial dynamics that are often critical for providing diverse habitat types, such as open water and marsh within the same basin (Cowardin et al. 1979; Pechmann et al. 1989). However, linking continuous water level measurements to inundated area requires considerable field work to develop these stage-area relationships. Furthermore, we believe there are many wetland water level datasets waiting to be uncovered that could be integrated into this modeling framework to extend predictions to other areas and refine the present work. Therefore, we chose to focus on a presence/absence type prediction rather than some more complex metric such as areas or volumes. This somewhat naïve view of “wetness” could be too simplistic in wetlands with high degrees of microtopographic relief that lead to isolated pools disconnected from a central basin. Even in those cases it may be that a suitable threshold water level could be targeted to reflect the most important habitat characteristics still rendering a similar approach useful.

Critical Need for Monitoring data

This modeling effort would not have been possible without the extensive dataset which represents a considerable effort of many people and investment of resources. Therefore, we would like to highlight the importance of field data and long-term records. If similar monitoring networks to the one used here, even with far fewer wetlands, could be established (or brought to light where they exist) it could allow similar efforts to be conducted over greater spatial scales to build models that are appropriate for broader areas and wetland types. This could go a long way to further the wetland communities’ ability to predict inundation dynamics in depressional wetlands which could increase our understanding of habitat diversity across broader landscapes. While this study relied on surface water level data, most any type of observational data (e.g., iButtons, [Anderson et al. 2015], or citizen observations [North et al. 2023]) can be helpful in furthering our understanding of wetland hydrology as long as the methods used and data quality are properly documented.

Conclusions

In this study we demonstrated the predictability of daily wetland inundation dynamics and hydroperiod using random forests and predictor variables that were derived from a GIS environment using readily available datasets for the conterminous United States (Table 1). The random forests were trained and evaluated using an extensive dataset containing water levels from 59 wetlands to predict whether a wetland was inundated or not on a given day. The models had low error rates (OOB = 0.05–0.06), good balanced accuracy (median = 83%), and were able to predict median hydroperiod most accurately in infrequently inundated wetlands and least accurate in permanent wetlands. However, a different pattern was observed when considering the total number of inundated days – with permanent wetlands the most accurate (99%) and infrequently inundated wetlands, the least (93%). This suggested the choice of metric can provide different insights into model performance, and considering both metrics together can give a more complete picture of model performance and where the models may be most useful. Wetland inundation dynamics and hydroperiod are key controls on most wetland functions. Improving our ability to predict these dynamics will be beneficial to researchers and managers alike. Where validation data exist, the approach described here has the potential to expand our understanding of inundation dynamics of small depressional wetlands to be better prepared for the evolving conservation needs of these critical habitats in the future.