The model developed in Sect. 2 has been applied to two different data sets, one from Bracknell in England and the other from Bochum in Germany. The Bracknell data set was collected over a period of 15 years, in the form of rainfall bucket tip times, whereas the Bochum data set was collected as five minute rainfall depths over a period of 69 years. We explored two different versions of the model proposed in Sect. 2. The first one assumed that the lifetime of the rain pulses terminates after a fixed duration d and the second one extended this model by taking the pulse duration d as a random variable.
Analysis of Bracknell Data
We begin our analysis with the fixed pulse duration model applied to the Bracknell data. Previous studies suggest that \(d=1\) is sufficient to capture the properties of rainfall well and we shall use this value in our analysis. In addition, we take the initial pulse depth X at the pulse origins as independent random variables with an exponential distribution with parameter \(\theta _1\) at State 1 and \(\theta _2\) at State 2. Our model then has seven parameters per month and we estimate them by the method of moments approach using the objective function given in Eq. (15). The estimates of the model parameters, when this model is applied to the Bracknell data, are given in Table A1 (See Supplementary Material for Table A1). The time-scales used in fitting were \(h=20\) minutes for the mean and \(h=10, 30, 60\) minutes for the standard deviation and lag-1 correlation. The estimates show that the rainfall bursts have high arrival rates (\(\phi _2\)) in State 2 with shorter sojourn times (\(1/\mu\)) and low arrivals (\(\phi _1\)) with long sojourn times (\(1/\lambda\)) in State 1.
Our model performance is assessed by comparing the fitted values of the theoretical properties, calculated using the estimated parameters, with the corresponding empirical values. The comparison was made at both sub-hourly and sub-daily time-scales, including those that are not used in fitting. In addition, simulation bands using 1000 simulations from the fitted model were calculated and displayed with observed (empirical) and fitted (theoretical) values.
For all the plots in this section, the black line represents the empirical values, the blue line shows the fitted values of our proposed state-dependent exponentially decaying initial pulse model M2, the red lines show the simulation bands. We compare the results of the proposed model (M2) with that of the model which has a common initial pulse distribution in both states (M1). The brown dashed lines are for model M1: they are included in the plots for comparison. Figure A1 shows that the empirical and fitted means of the aggregated rainfall at \(h=1\) hour are in excellent agreement and hence the mean rainfall has been reproduced well by the fitted model (see Supplementary Material for Fig. A1). The same is true at all the other time-scales, as the mean is simply scaled by a factor of h.
The empirical and fitted values of the standard deviation of the accumulated rainfall at several time-scales (\(h=1/6, 1, 6\) hours) are displayed in the left-hand panels of Fig. A2, along with simulation bands (see Supplementary Material for Fig. A2). Here again, both observed and fitted curves are in excellent agreement, at all time-scales, and the alignment between the observed and fitted values of our proposed model M2 is better than that of the reference model M1 which has a fitted value outside the simulation bands for the month May. The empirical and fitted values of the lag-1 autocorrelation of the accumulated rainfall are displayed in the right-hand panels of Fig. A2, along with simulation bands. Both observed and fitted curves are in excellent agreement at finer time-scales. Although there are some differences between the observed and fitted curves at coarser time-scales, they are both well within the simulation bands. Here again the model M2 provides a better fit.
The observed and fitted values of the coefficient of variation of the aggregated rainfall are in good agreement at all time-scales in the left-hand panels of Fig. A3, including those that are not used in fitting (See Supplementary Material for Fig. A3). Once again it is noticeable that the proposed model M2 has better alignment with empirical values than the model M1. The right-hand panels of Fig. A3 display the observed values of the proportion of dry periods together with simulation bands from the fitted model M2 at time-scales \(h=1/12, 1/6, 1/3\) hours. The model appears to reproduce these reasonably well and capture their pattern across the year quite well at finer time-scales, but not at coarser time-scales. Our model tends to overestimate the proportion of dry periods at coarser time-scales. However, these statistics are not used in fitting and hard to reproduce at all values of h, as they depend more on the scale of measurement.
To compare the performance of the two models (M2 and M1) numerically, we calculated the root mean square error (RMSE) of the three statistics used in fitting. Their mean square error is calculated as the squared difference between the empirical and fitted values of the statistics averaged over all eleven time-scales considered in our analysis, from h=1/12 to \(h=24\), separately for each month. The smaller the values of the RMSE, the better the model fit, as it shows closer agreement between the observed and fitted values. Table A2 shows the values of the root mean square error of the three statistics mean, standard deviation and autocorrelation (see Supplementary Material for Table A2). It is clear from the Table that the RMSE values of the model M2 are mostly smaller than those of M1, which provides evidence of the fact that M2 outperforms M1.
Analysis of Bochum Data with Fixed Pulse Duration Model
Here, we use our state-dependent initial pulse model with the fixed pulse duration to analyse the Bochum rainfall data over a 69 year period. We start our analysis by taking the pulse duration as \(d=1\) and assume that the initial pulse depths follow exponential distributions with mean \(1/\theta _1\) and \(1/\theta _2\) in States 1 and 2, respectively. This model was fitted to the data using the weighted objective function given in Eq. (16) separately for each month, to obtain the parameter estimates and they are given in Table A3 (see Supplementary Material for Table A3). The weights applied to the statistics in the objective function were calculated as the reciprocal variance of the yearly statistics at each time-scale over the 69 years. The time-scales used in fitting were \(h=60\) minutes for the mean and \(h=5, 20, 60\) minutes for both the standard deviation and autocorrelation, as well as \(h=12\) hour for autocorrelation.
The estimates show that the overall pattern of the rainfall characteristics is similar to that of the Bracknell data, suggesting the two regions have similar rainfall patterns. The Bochum rainfall bursts have slightly smaller arrival rates (\(\phi _1, \; \phi _2\)) but longer sojourn times (\(1/\lambda , \;1/\mu\)) in both states when compared with those of Bracknell data. In addition, both states have larger mean for initial pulse depth (\(1/\theta _1\), \(1/\theta _2\)) for the Bochum data. This suggests that Bochum experiences fewer rainfall bursts but with larger initial pulse depth than Bracknell. Another point worth noting is that the estimates of the parameter \(\beta\) for Bochum data are smaller than those for Bracknell, which suggests that the rain pulses take longer to deposit the rain. Estimates of \(\beta\) show that, for each of the 12 months, the rain pulses deposit 95% of their rain within 30 minutes, and 99% within 50 minutes from their pulse origin.
The plot for the observed and fitted mean rainfall at time-scale \(h=1\) is displayed in Fig. A4 (see Supplementary Material for Fig. A4). There is close agreement between the fitted and empirical values at \(h=1\) hour and also at all other time-scales. The left-hand panels of Fig. 2 show the empirical and fitted values of the standard deviation of the accumulated rainfall at time-scales \(h=1/6, 1, 6\) hours along with their simulation bands. Here again both observed and fitted curves are in near perfect agreement at all time-scales, including those that are not used in fitting. The same can be said about the empirical and fitted values of the lag-1 autocorrelation of the accumulated rainfall displayed in the right-hand panels of Fig. 2.
The left-hand panels of Fig. 3 show the empirical values of the skewness coefficient for the accumulated rainfall at time-scales \(h=1/6, 1, 6\) hours along with the simulation bands from the fitted model. The fitted model clearly underestimates the skewness at sub-hourly time-scales but does reasonably well at coarser time-scales. The right-hand panels of Fig. 3 display the empirical values of the proportion of dry periods together with a simulation band from the fitted model M2 at time-scales \(h=1/12, 1/6, 1/3\) hours. The model appears to reproduce the proportion of dry periods reasonably well and capture its pattern across the year quite well at these sub-hourly time-scales, but not at coarser time-scales.
Analysis of Bochum Data with Variable Pulse Duration Model
In this section, we extend our model to allow the pulse lifetime d to vary rather than taking a fixed value. This can be done in different ways and one approach is to take the pulse lifetime as a random variable with a specified distribution. Another approach is to take d as a parameter of the model and try to estimate it along with other parameters and we employ this second approach in this paper. When d is taken as a parameter, the expressions for mean, variance and autocovariance given in Eqs. (11), (12) and (13) are still valid and we treat them as functions of one additional parameter. The eight model parameters \(\lambda ,\mu ,\phi _1,\phi _2,\beta ,\theta _1,\theta _2 \; \text {and}\; d\) are estimated by employing the weighted objective function (16) and using the statistics mean (\(\mu\)), variance (\(\sigma\)) and autocorrelation (\(\rho\)) over the same combination of time-scales as those used earlier for the fixed d model in Sect. 5.2. The estimated model parameters are given in Table 1.
Table 1 Parameter estimates for the state-dependent initial pulse depth model with variable pulse lifetime for Bochum data The parameter estimates have similar patterns to those of the earlier model with fixed d and the mean sojourn times \(1/\mu\) of the State 2 are shorter in summer months than those of the winter months. The values of overall \(\hat{\mu }_X\) are again larger for summer months, showing higher initial rainfall intensity for the pulses, when compared with the winter months. The parameter estimates \(\hat{\beta }\) are similar to those of the fixed d model used earlier. The estimated values of d suggest that the average duration of the pulse lifetime for Bochum is between 0.58 and 0.89 hours.
Figure 4 displays the observed and fitted means of the aggregated rainfall at \(h=1\) and they are in perfect agreement which shows that the mean rainfall has been reproduced very well by the fitted model. The dashed line (brown) in Figs. 4 and 5 is for the fitted values of the model described in the next subsection, and is given here for comparison and will be discussed in Sect. 5.4. The empirical and fitted values of the standard deviation of the accumulated rainfall are given in the left-hand panels of Fig. 5 at sub-hourly and higher time-scales, along with simulation bands. Here again, both empirical and fitted values of our proposed model M2 are in excellent agreement at all time-scales, including those not used in fitting. The simulation bands suggest that the sampling distribution of the standard deviation is skewed at sub-hourly time-scales for the summer months but it gets better and less skewed at coarser time-scales. The observed and fitted values of the lag-1 autocorrelation of the aggregated rainfall for the state-dependant initial pulse model M2 are in very good agreement in the right-hand panels of Fig. 5 for all time-scales. Hence, the fitted model M2 performs well in reproducing the autocorrelations.
The empirical values of the skewness coefficient of the accumulated rainfall are given in the left-hand panels of Fig. 6, for hourly and higher time-scales, along with simulation bands. Our model vastly underestimates the skewness at sub-hourly time-scales but does reasonably well at coarser time-scales. The observed values of the proportion of dry periods are displayed in the right-hand panels of Fig. 6, together with simulation bands from the fitted model at sub-hourly time-scales. The model appears to reproduce these reasonably well and capture their pattern across the year quite well at \(h=1/12,\; 1/6\) hours, but not at other time-scales. In general, our model overestimates the proportion of dry periods at coarser time-scales. These are, however, minor discrepancies given that these statistics are not used in the fitting, depend more on the scale of measurement and are affected by the occasional arrival of rain pulses in State 1.
To study how well our model captures the extreme rainfall, we compare the annual extreme values of the observed rainfall data with those generated by the proposed model. Figure 7 shows the ordered empirical annual maximum rainfall (red solid lines) against the reduced Gumbel variate for \(h=1/12, 1, 24\) hours along with the vertical interval plots showing the variability of the simulated ordered maxima from the fitted model. The mean of the 100 simulated ordered maxima for each plotting position is identified by the triangles in the interval plots . The return periods of the extreme rainfall are specified at the foot of the plot above the x-axis. At the five minute (\(h=1/12\)) time-scale, the model underestimates the extremes. As reported in previously published studies, see for example (Cowpertwait et al. 2007), the estimation of extreme values at sub-hourly time scales is a common problem for most stochastic point process models for rainfall and our results reveal the same. Despite the underestimation at the sub-hourly level, our model reproduces extremes well at the hourly time-scale, which is a notable improvement from earlier results (Ramesh et al. 2017), and the same goes for the daily time-scale.
Model Comparison
The variable pulse duration model described in Sect. 5.3 provided the best results for the Bochum data. To assess the performance of this model, we shall compare it with one of the existing doubly stochastic point process models for rainfall. The Bracknell data analysis in Sect. 5.2 compared the performance of the state-dependent initial pulse depth model M2 with that of the common initial pulse depth model M1. As there were no substantial differences in the results of the two models in that comparison despite some improvement, we now chose to compare the results of the state-dependent initial depth exponential pulse model M2 with that of a doubly stochastic rectangular pulse model (M0), described in Ramesh (1998), when both models are fitted to the Bochum data.
Figure 4 displays the empirical mean rainfall and the fitted values of the mean rainfall from the doubly stochastic rectangular pulse model (M0) as well as the state-dependent exponential pulse model (M2). The broken brown line shows the fitted values of this rectangular pulse model in Figs. 4 and 5 and the other lines of the plots are as described earlier in Sect. 5.3. Figure 4 shows that the mean rainfall has been reproduced better by our new model M2, especially for the summer months.
The left-hand panels of Fig. 5 compare the fitted values of the standard deviation of the accumulated rainfall from the two models with the empirical values at sub-hourly and coarser time-scales. The observed and fitted curves are in excellent agreement for the state-dependent exponential pulse model which clearly outperforms the rectangular pulse model at all time-scales in reproducing the standard deviation of the rainfall. The observed and fitted values of the lag-1 autocorrelation of the aggregated rainfall for the two models are compared in the right-hand panels of Fig. 5. They suggest that the rectangular pulse model vastly overestimates the autocorrelation at sub-hourly time-scales whereas, the state-dependent exponential pulse model provides a near perfect fit at these time-scales. The performance of the rectangular pulse model gets better at the hourly time-scale, although not as good as that of the exponential pulse model, but it gets worse again for coarser time-scales.
Here again, to compare the performance of the proposed state-dependent initial pulse depth model M2 with that of a rectangular pulse model M0 numerically, we calculated the root mean square error (RMSE) of the three statistics used in fitting. They are calculated as the square root of the squared difference between the empirical and fitted values of the statistics, averaged over all eleven time-scales considered in our analysis. Smaller values of the RMSE means closer alignment between the observed and fitted values. Table 2 shows the values of the root mean square error of the three statistics mean, standard deviation and autocorrelation for the two models M0 and M2 applied to the Bochum data. Results show that the RMSE values of the model M2 are smaller than those of M0 in almost every case, providing evidence to the fact that M2 outperforms M0.
Table 2 Root mean square error of the three statistics used in fitting for the models M0 and M2