Modeling variability of infiltration tests in ephemeral stream beds as a random function for uncertainty quantification

Infiltration processes are highly variable in space and time, and therefore, building reliable hydrological models without considering the variability is questionable. In this research, we propose a methodology that can systematically handle the variability in the infiltration process. The methodology is based on the theory of random functions in a dimensionless formalism that allows the derivation of a generalized model from the observed infiltration test data. The Monte Carlo technique is utilized to generate hypothetical infiltration tests that carputer the characteristics of the real tests. The methodology is applied to a case study in ephemeral stream beds located in Al Madinah Al Munawarah Province in Saudi Arabia. The measurements are made by the double-ring infiltrometer. Beta distribution fits the dimensionless cumulative infiltration relatively well at a 1% significant level at all times, and therefore, it can be used to model the uncertainty in hydrological modeling. High variability is observed in infiltration tests at the early time (a platykurtic distribution with high dispersion); however, it decreases at the late time (Leptokurtic distribution with low dispersion) since the infiltration reaches a steady infiltration. Some extreme tests show different behavior from the fourteen tests that cannot be captured by the model and therefore need special treatment.


Introduction
In the arid regions, infiltration is a very significant process in the hydrologic cycle, where the rainwater infiltrates downwards into the soil for groundwater replenishment, especially in shallow aquifers. Infiltration processes allow the soil to store the water by replacing the air in the soil voids by infiltrating water to be useful for groundwater recharge and agricultural activities.
Infiltration tests are the method by that water is percolating from the ground surface into the soil to evaluate the hydraulic characteristics of the soil. These tests are playing an important role in the hydrological behavior of surface runoff, soil eroding, and recharging of the groundwater aquifer.
Infiltration rates have a wide range of variation values based upon the type of soil layers; also the rates of infiltration can be varying through a single layer of soil as reported by Parr and Bertrand (1960). The variations in infiltration rates in the single layer of soil are due to the inhomogeneity 1 3 87 Page 2 of 14 of moisture content, the texture of the soil layer itself, and the agricultural environments. Casenave and Valentin (1992) reported that about more than 80% of the infiltration variations are because of the ground surface situations. The infiltration rates depend on geomorphology, ground surface roughness, vegetation cover, and hydraulic soil properties such as porosity and hydraulic conductivity (Freeze and Cherry 1979;Tricker 1981;Dingman 2008). Johnson (1963) and Eijkelkamp (2015) reported that when the infiltration rate gets constant, this is related to the infiltration capacity, according to ASTM (2009) andEijkelkamp (2015), although there is no direct relationship between the infiltration capacity and hydraulic conductivity, the hydraulic conductivity could be estimated via the saturated condition of infiltration rate. Hydraulic conductivity considers permeability, in some publications reporting that the permeability coefficient is the preferred synonym for hydraulic conductivity (Rönnqvist 2018). Rönnqvist (2018) reported that hydraulic conductivity is the capability of the soil to percolate the water via its intersected voids, and the high level of hydraulic conductivity is getting steady state (saturated hydraulic conductivity). Hydraulic conductivity is the most important factor in geotechnical science, particularly in the engineering of dams, because of its importance to study the materials of the dam area and the drainage characteristics (Hayden 2010). The amount of percolating water from the ground surface into the soil is defined as the infiltration rate (water volume/area/ time). Chow et al. (1988) reported that the hydraulic conductivity of unsaturated soils is about half that of saturated, and Hayden (2010) said that the hydraulic conductivity in unsaturated soils is more about 70% than in saturated soils. Eijkelkamp (2015) stated that the infiltration rate decreases with time from the initially high rate to the steady-state rate (infiltration capacity), decreasing from the initial rate to the steady state due to the matrix absorption of the dry shallow soils.
Batlle-Aguilar and Cook (2012) achieved an infiltration experiment at the stream reach scale to assess the infiltration rate underneath an ephemeral, losing stream during streamflow events, and they conclude that the transient infiltration rates, meaningfully above steady-state infiltration rates, can continue for some days after the flow begins. The antecedent moisture content considers a controlling factor of infiltration rate and groundwater recharge. In arid regions, it is predictable that flow events of 10-15 days are necessary for recharging the shallow aquifer.
From the aforementioned review of the infiltration process and its characteristics, it is concluded that the infiltration process varies spatially due to soil heterogeneity and temporally due to the degree of soil saturation over time. Building reliable hydrological models without considering spatial and temporal variabilities is questionable. To the best of the authors' knowledge, there are no specific studies that modeled the variability in the infiltration processes in the hydrological literature. Therefore, in this paper, we propose a methodology to model the variability of the infiltration tests from measurements in ephemeral stream beds in the arid region using the theory of random functions. The Monte Carlo technique is utilized to simulate the field infiltration and generate equally probable realizations of the infiltration tests for quantification of the uncertainty due to this variability which can later be introduced in hydrological modeling to account for the uncertainty of the outputs (e.g., hydrographs, recharge, etc.). The methodology is applied to a case study in ephemeral stream beds located in Al Madinah Al Munawarah Province in Saudi Arabia.

Description of the study area
The study area which is called Al-Madinah Al-Munawarah Province considers one of the most important Provinces in the Kingdom of Saudi Arabia (KSA), from a historical and religious point of view. It is situated in the Western Central portion of KSA; it is restricted between latitudes 22.48° and 27.48° N and longitudes 37.45° and 42.12° E with an area of about 176,716 km 2 and a perimeter of about 2900 km ) Fig. 1).
The study area is characterized by an arid to hyper-arid climate of high temperature and less rainfall with average precipitation ranging from 40 to 80 mm (60 mm), per year ( Fig. 2a, b).
Topographically, the study area is characterized by a wide range of elevations which is ranging from 0 m above mean sea level at the Red Sea Coast to more than 2000 m at Hijaz Mountainous area (Fig. 3a).
Geologically, Al-Madinah Al-Munawarah Province occupies a considerable portion of the Arabian Shield and is composed of a wide range of geological periods extending from Precambrian to the recent times. The main geologic formations are extending from fractured igneous and metamorphic rock (basements), sedimentary, Wadi deposits, and lava flows (Harrat) as shown in (Fig. 3b).

Data collection from infiltration tests
The double-ring infiltrometers are well known and commonly implement to measure the infiltration rate as reported by (Bouwer 1986;Chow et al. 1988;Eijkelkamp 2015). This device is consisting of two open coaxial cylindrical outer and inner rings, the outer is a large diameter (50 cm), and the inner ring is less diameter (25 cm). These double rings should be inserted into the ground at a vertical distance of about 10 cm using a special hammer to keep the soil without disorders and instabilities. A constant water head should be kept inside the double rings, and the water inside the external cylinder is very important to prevent the horizontal leakage through the internal ring permitting only the vertical flow from the internal ring. The measured infiltrated water by ruler scale will continue with recording the time until the steady state of infiltration is achieved (Fig. 4). Figure 5 shows  the locations of the infiltration tests which are located in ephemeral stream beds of the wadis in the study area that contributes to groundwater recharge during the passage of the flood in such stream (transmission losses, Elfeki et al. 2014a, b). Figure 6 shows the measurements of the development of the infiltration depth and infiltration rates in time. The figure shows high variability of the infiltration rates and depths which depends on spatial location and soil characteristics.

Methodology
In the following subsections, the methodology is explained in detail and the formulas and models used to get the results are provided.

Data transformation into dimensionless curves and percentile calculations
Since the infiltration tests in Fig. 7a have different total infiltration depth and different total elapsed time for each test, it is essential to transfer the data into dimensionless curves to be able to display them on the same scale, both in the total duration of the test and in the total infiltration depth, so that a universal model of the variability can be developed. The dimensionless curves have many applications in hydrology, e.g., for temporal distribution curves of rainfall storms (Huff 1967(Huff , 1990Bonta 2004;Elfeki et al. 2014a, b;Ewea et al. 2016) and dam reservoir characteristic curves (Mohammadzadeh-Habili et al. 2009;Kamis et al. 2020).
The following steps are the procedure to develop such curves.

Derivation of dimensionless curves: The infiltration
test data of the cumulative infiltration of the 14 tests presented in Fig. 7a are transformed into dimensionless curves by dividing the cumulative infiltration depth at any time, F, by the total infiltration depth at the end of the test, F max . With the same procedure, the elapsed time, t, from the beginning of the test is divided by the  can be performed at each t/t max = 0.05, 0.10,…, and 0.95. Figure 6d shows the interpolated (F/F max ) at equal intervals of (t/t max ) from all the tests. 3. Percentile calculations: The Pth percentile (0 < P < 1) of the data is calculated by sorting the list of n values from the least to the greatest. From the ordered values, one should get the value that corresponds to the P% of the data. This is done through the use of the percentile function in Excel (PERCENTILE.INC). This procedure is applied at each dimensionless time, t/t max , and the results are displayed in Fig. 7e.

Theory of random functions
The random function uses the concept of probability theory to model the variability in a phenomenon that cannot be described deterministically (Pugachev 1965). Therefore, the notion of the probability density function of a random variable is utilized.
In the current study, since the infiltration tests show variability between each test depending on the soil characteristics, therefore, we propose the use of this theory to model such variability. Since the infiltration test data are transformed into dimensionless curves that vary between 0 and 1 so, it is convenient to use the beta distribution (that is also defined over 0 and 1) to model the tests. Statistical tests will be performed to check its applicability.
The beta probability density function, p(x), is given by, where x is the random variable, , β are the shape and scale parameters, respectively, of the distribution, and Γ() is the gamma function, which is expressed as, The parameters , β can be estimated from the mean, , and the variance, 2 , of the data as,

The goodness-of-fit test
The Kolmogorov-Smirnov (K-S) test (Hamed and Rao 1999) is used to check if the beta distribution would fit the data based on the following hypotheses: H 0 x Follows a beta distribution.
H 1 x does not follow a beta distribution where H 0 is called the null hypothesis, H 1 is the alternative hypothesis, and x is the variable under study (i.e., percentiles 0.10, 0.20, 0.30, …, and 0.90). The K-S test looks at the max difference between the theoretical cumulative beta distribution and the empirical cumulative distribution function from the data in the form, where P x i , is the prescribed cumulative beta distribution function with parameters (shape and scale) and P n x i is the empirical cumulative distribution function obtained from the data with sample size n.
The null hypothesis can be rejected at a certain significant level ε when the calculated value of the K-S test is greater than the tabulated value (three levels ε = 1%, 5%, and 10% have been tested).

Monte Carlo technique
The Monte Carlo technique is an approach to studying stochastic processes. It depends on random numbers and probability distribution to assess the variability in the model predictions. The idea behind the method is on generating multiple realizations from the developed model based on the probability distribution of the data as input to the model. The generated output realizations are evaluated to quantify both the uncertainty in model predictions and to relate the model output to the input variables that give rise to this uncertainty. The Monte Carlo method involves the following three steps: (1) generation of random numbers between 0 and 1 using the "RAND" function in Excel. Then these random numbers are used to generate the random variable of the beta distribution function using "BETA.INV" in excel.
(2) The generated values in the above step are introduced to generate the model output, (F/F max ), at each time (t/t max ). Many output realizations are obtained by repeating steps 1-to 3 by changing the random numbers obtained in the first step. (3) The dimensionless curves have to transfer back to absolute curves by multiplying infiltration, F, by F max , and the absolute time by t max for each test. (4) The final step is collecting all output realizations and presenting the actual test and the generated realizations in a graphical form. Also, the ensemble mean is calculated, and the rootmean-square error is estimated between the ensemble mean and the data of the actual tests to evaluate the error in the generated realizations.

Results and discussions
In this section, the results and discussions are presented in subsections. In summary, we discussed the variation of the statistical parameters the dimensionless cumulative infiltration (F/F max ) versus dimensionless time (t/t max ), percentiles (P) of the dimensionless cumulative infiltration (F/F max ), fitting beta distribution to dimensionless cumulative infiltration (F/F max ), simulation results of the infiltration tests (cumulative infiltration, and simulation results of the infiltration tests (infiltration rates). Figure 8 shows the variation of the statistical parameters (mean, standard deviation, SD, and coefficient of variation, cv) of the dimensionless cumulative infiltration (F/F max ) versus the dimensionless time (t/t max ). The mean of (F/F max ) shows a monotonic growth in time (t/t max ) which is typical for cumulative infiltration. The SD of F/F max = 0 at t/t max = 0 (beginning of the test) and at t/t max = 1 (end of the test) while it reaches its maximum value at the early time due to the high variability at the initial infiltration which decays at the late time when reaching to the steady infiltration capacity. The cv = SD/mean of F/F max shows high values at early times while it decays to zero at the late time (t/t max ). This is a consequence of the mean and the SD curves. Figure 9 shows the percentiles of the (F/F max ) versus the time (t/t max ). The highest P = 0.9 is the highest graph, and it goes down to the lowest graph at P = 0.1. These bounded curves correspond to the bound of the variability in the data at 90% confidence as the upper limit and 10% confidence as the lower limit. In between these limits, other percentiles are available from P = 0.2 up to P = 0.8. It is obvious from the graph that the percentiles from P = 0.1 up to P = 0.8 are close to each other while the curve at P = 0.9 is far apart, due to the extreme infiltration tests (Tests 5 and 9 in Fig. 7b) that show different behavior from other tests. The waterfront progressed quickly in the early times, whereas progressed slowly in later times. This behavior could be explained by the dominant mechanisms (gravity and capillarity) both acting downward, letting the water infiltrate quickly into the soil at the early time, while the gravity is dominating at the late time when the infiltrated water increases, and if there is a   Table 1 shows the results of the K-S test of fitting the beta distribution to the dimensionless infiltration (F/F max ). The table shows that beta distribution cannot be rejected at 1% significant level (99% confidence level) at t/t max = 0.1, 0.2, 0.3, …, and 0.9, respectively, since K-S values at t/t max = 0.1, 0.2, 0.3, …, and 0.9 are less than the tabulated K-S value at 1% significant level (99% confidence level) and also cannot be rejected at 95% confidence level for t/t max = 0.3, 0.4, 0.5, …, and 0.9. However, it is rejected at a 90% confidence level in all t/t max . Therefore, beta distribution can be used to generate values for the dimensionless infiltration (F/F max ) at each t/t max = 0.1, 0.2, 0.3,…, and 0.9, respectively, based on its mean and variance according to Fig. 8. Figure 10a shows the fitting of beta distribution to the data (F/F max ) at t/t max = 0.1, 0.2, 0.3…, and 0.9, respectively. Figure 10b shows the probability density function (pdf) of the beta distribution. Both Figs. 10a, b show that at the early time the distribution has a high variance while going to a low variance at the late time. Also, Fig. 10b shows the pdf goes from platykurtic at the early time to leptokurtic at the late time. This is due to the high variability in the initial infiltration at the early time while it reduces to low variability in the late time going to steady infiltration capacity. Figure 11 shows ten equally probable realizations out of 100 realizations generated by the model for the cumulative infiltration depth and plotted against the actual field experiment. Also, the upper (95% confidence) and lower (5% confidence) limits are plotted. One may see the essential features of the realizations are similar to the field tests except for Tests 4, 5, and 9. These tests show extreme behavior that is not captured by the model. The reasons are explained by Niyazi et al. (2022). This behavior could be explained by the dominant mechanisms (gravity and capillarity) both acting downward, letting the water infiltrate quickly into the soil at the early time, while the gravity is dominating at the late time when the infiltrated water increases, and if there is a water table near the ground's surface, the waterfront stopped to advance down when touching the water table. Except for the extreme tests, the overall tests sometimes show overestimations (e.g., Tests 2 and 3), others show underestimation (e.g., Tests 5, 10), and others show a good match (e.g., Tests 1,11,and 14). This shows the unbiasedness of the model results. The figure also shows that five tests are falling within the upper and lower 95% confidence levels (Test 6,8,10,11,and 14); other tests (9 tests) are out of bounds. This is an indication of the high variability of these tests that cannot all be captured by the model and needs more sophisticated pdf models. Table 2 shows summary statistics of the model performance criteria in terms of RMSE. The minimum RMSE for F equals 0.39 cm (Test 6) while the maximum RMSE = 23.11 cm (Test 5). The cv = 1.22 shows high variability between the tests which is confirmed by the simulations in Fig. 11. The table also shows the average of the infiltration rates and depths for each test and the corresponding average of the upper and lower bounds of both  infiltration rates and depths, respectively. For the average infiltration depths, the maximum value is 104.1 cm for Test 7, and this value is out of the bounds (116.83-121.49 cm), i.e., overestimated by the model. This test shows low infiltration in the beginning and changes to high infiltration at a late time. The reason is the presence of a silty low permeable layer at the surface that goes to the sandy layer deeper. Also, Tests 3 and 4 show similar behavior. The minimum value of the average infiltration depth is in Test 9 (3.54 cm), and this value is out of the bounds (2.37-2.46 cm); however, this test is underestimated by the model. The reason, in this case, is that the infiltration is very high in the beginning, and it goes very slowly at a late time, and this is due to either the waterfront facing a low permeable rock or infiltration reaching the water table. Test 5 shows also similar behavior.

Simulation results of the infiltration tests (infiltration rates)
The infiltration rates are estimated from the cumulative infiltration depths by numerical differentiation as, where f (t) is the infiltration rate at the time, t, ΔF(t) is the difference between two successive infiltration depths, and Δt is the time difference between two successive infiltration depths. Figure 12 shows the infiltration rates of the actual tests and ten equally probable realizations of the infiltration rates estimated by Eq. (6) from the cumulative infiltration depths in Fig. 11. In the figure, the upper (95% confidence) and lower (5% confidence) limits are plotted. The overall simulations show relatively similar shapes. An in-depth investigation of the figures shows that the simulations of the tests are located above and below the field data; however, this is not observed in Tests 4, 5, and 9. As explained earlier, Tests 4, 5, and 9 cannot be captured by the model since they have extreme behavior as explained earlier in Fig. 11. The figure also shows that the majority of the realizations are falling within the 95% bounds.
(6) f (t) = ΔF(t) Δt Fig. 11 Simulation results of the infiltration tests (cumulative infiltration, F vs. time, t): the actual test (solid black line), ten generated realizations (dots), and the upper (95% confidence) and lower (5% confidence) bounds ◂  Table 2 shows the summary statistics of the simulation results. The minimum RMSE for f equals 0.006 cm/min (Test 8) while the maximum RMSE = 0.261 cm/min (Test 5). The cv = 1.21 shows high variability between the tests which is confirmed by the simulations in Fig. 12. Table 2 also shows the average of the infiltration rates and the total infiltration depth from the data and the corresponding average of the upper and lower bounds of the infiltration rates and the total infiltration depths from the simulations. For the average infiltration rates from the data, the maximum value is 0.477 cm/min for Test 7, this value is within the bounds (0.31-0.65 cm/min) from the simulations; however, the cumulative depth is overestimated by the model, and the reason is of course due to the high rate of infiltration and the accumulation process of the infiltration rates that lead to overestimation. The minimum value of the average infiltration rate is observed in Test 9 (0.011 cm/min), and this value is within the bounds (0.007-0.015 cm/min); however, the infiltration depth was underestimated by the model. The reason for this is the lower rate of infiltration at this test. Also, it should be noted that the transformation from infiltration depth to the infiltration rates using the numerical derivative (Eq. 6) removes the accumulation in the cumulative depth, and therefore, it falls within the bounds of the infiltration rates.

Conclusions
High variability is observed for the infiltration depth and consequently at infiltration rates at the early time of the experiment (cv > 0.7); however, the variability decreases at the late time of the experiment since the infiltration reaches a steady infiltration at the late time. This accounts for the variability in the results of the hydrological modeling of the arid basins. Beta distribution fits the dimensionless cumulative infiltration (F/F max ) relatively well at a 1% significant level at all times; therefore, it can be used in the uncertainty analysis in the hydrological modeling inputs of the infiltration test data. The beta distribution goes from platykurtic at the early time to leptokurtic at the late time. This is due to the high variability in the initial infiltration at the early time while it reduces to low variability in the late time going to steady infiltration capacity. The percentiles from P = 0.1 up to P = 0.8 are close to each other while the curve at P = 0.9 is far apart, due to the extreme infiltration tests (Tests 5 and 9) that show different behavior from other tests. The simulation results showed that the minimum RMSE for f equals 0.006 cm/min (Test 8) while the maximum RMSE = 0.261 cm/min (Test 5). Also, for F, the minimum RMSE equals 0.39 cm (Test 6) while the maximum RMSE = 23.11 cm (Test 5). Therefore, it is recommended to treat these extreme tests separately since they cannot be captured by the model.
The methodology presented herein for modeling the variability in the infiltration tests is of great importance to be applied in regions where there are ungauged basins. So that it provides an alternative way to generate/simulate infiltration tests in places that has no such infiltration test data, and it can also account for the uncertainty through the application of the Monte Carlo method during the hydrological modeling of the basins.
Such a study is recommended at numerous drainage basins in arid regions to improve water management and precisely assess the components of water balance, which in turn can assist planners and decision-makers in the sustainable management of water resources.