1 Introduction

An important challenge we face in environmental or ecological impact studies is to provide fast and realistic simulations of atmospheric variables such as rainfall at various temporal scales. Stochastic point process models provide a basis for generating synthetic rainfall input to hydrological models where the observed data at the required temporal scale are not available. They also enable us to assess the risk associated with hydrological systems. There has been a number of stochastic point process models developed by many authors over the years. Among these, the models based on Poisson cluster processes (Rodriguez-Iturbe et al. 1987; Cowpertwait 1994; Onof and Wheater 1994; Wheater et al. 2005) utilizing either the Neyman-Scott or Bartlett-Lewis processes have received much attention, since their model structure reflects well the climatological features of the rainfall generating mechanism. A good review of developments in modelling rainfall using Poisson cluster processes is provided by Onof et al. (2000). In addition, rainfall models based on Markov processes have also made a reasonable contribution to help tackle this challenge. See for example, Smith and Karr (1983), Bardossy and Plate (1991), Ramesh (1998), Onof et al. (2002) and Ramesh and Onof (2014) amongst others.

Much of the work on this topic, however, has concentrated on modelling rainfall data recorded at hourly or longer aggregation levels. Stochastic models for fine-scale rainfall are equally important, because in some hydrological applications there is a need to reproduce rainfall time series at fine temporal resolutions. For example, sub-hourly rainfall is required as input to urban drainage models and for small rural catchment studies. In addition, the study of climate change impacts on hydrology and water management initiatives requires the availability of data at fine temporal resolutions, which is usually not available from general circulation model (GCM) simulations. The best available approach to generating such rainfall currently lies in the combination of an hourly stochastic rainfall simulator, together with a disaggregator making use of downscaling techniques. There has been some recent work on modelling fine-scale rainfall using point process models. Rather than attempting to reproduce actual rainfall records at a fine-scale, using downscaling techniques or by other methods, these stochastic point process models aim to generate synthetic precipitation time series directly from the proposed stochastic model. One good example of this is provided by the work of Cowpertwait et al. (2007, 2011) who developed a Bartlett-Lewis pulse model to study fine-scale rainfall structure. Their model particularly enables the reproduction of the fine time-scale properties of rainfall. A class of doubly stochastic Poisson processes was employed by Ramesh et al. (2012, 2013) and Thayakaran and Ramesh (2013) to study fine-scale rainfall intensity using rainfall bucket tip times data. They utilised maximum likelihood methods to estimate parameters of their models.

Our main objective in this paper is to develop a simple stochastic point process model capable of reproducing fine-scale structure of the rainfall process. The other objective is to provide a fast and efficient way of generating synthetic fine-scale rainfall input to hydrological models directly from one stochastic model. In this regard, and to take this fine-scale rainfall modelling work further, we develop a simple point process model based on a doubly stochastic Poisson process, following the Poisson cluster pulse model approach of Cowpertwait et al. (2007). Our preliminary work on this (Ramesh and Thayakaran 2012), analysing properties of rainfall time series at sub-hourly resolutions, produced encouraging initial results. In this paper, we extend this work further and accommodate third-order moments in estimation. Mathematical expressions for the moment properties of the accumulated rainfall in disjoint intervals are derived. The proposed stochastic model is fitted to 69 years of 5-min rainfall time series from Germany. The results of the analysis show that the proposed model is capable of reproducing rainfall properties well at various sub-hourly resolutions. Furthermore, the analysis incorporating third-order moments produced better results than the one that used only up to second-order moments. Unlike Cowpertwait et al. (2007), who used superposition of two Bartlett-Lewis pulse models to account for different storm types, we use one simple model to reproduce sub-hourly rainfall structure. The novel contribution of this study is the derivation of the third-order moment properties of the proposed model, as well as their incorporation in estimation, to reproduce fine-scale structure of rainfall more accurately. The proposed model provides a solid framework to generate synthetic fine-scale rainfall input to hydrological models directly from one stochastic point process model. In addition, the availability of a new stochastic model for the generation of fine-scale rainfall, at various sub-hourly scales, provides scientists with a useful tool for environmental or ecological impact studies.

The following section provides a background to this work, illustrates the model framework and then focuses on deriving moment properties of various component processes, such as the cell and pulse arrival processes. Properties of the aggregated rainfall sequence are studied and mathematical expressions for the third-order moment and the coefficient of skewness are derived. Section 3 presents the results of data analysis using 5-min rainfall aggregations and compares the results of two different analyses that used second and third-order moments in estimation. Extremes of the historical data are compared with the simulated extremes at various resolutions. The proportion of dry intervals is also studied. A simulation study is carried out to evaluate the performance of the estimation method. Conclusions and possible further work are summarised in Sect. 4.

2 Model framework and moment properties

2.1 Background

Doubly stochastic Poisson processes (DSPP) provide a flexible set of point process models for fine-scale rainfall. Ramesh et al. (2012, 2013) utilised this class of processes and developed stochastic models, for a single-site and multiple sites respectively. These models were used to analyse data collected in the form of rainfall bucket tip-time series. One of the advantages of these models, over most other point process models for rainfall, is that it is possible to write down their likelihood function which allows us to estimate the model parameters using maximum likelihood methods. However, the rainfall bucket tip-time series is not usually available for a long period of time. Most of the longer series of rainfall data are available in accumulated form, hourly or sub-hourly, rather than in tip-time series format. Moreover, the above DSPP models cannot be fitted directly to data collected in aggregated form, such as hourly rainfall, using the maximum likelihood method. Therefore it is useful to look for alternative models, based on doubly stochastic processes, that can be used to model sub-hourly data collected in aggregated form. Motivated by the performance of the above class of doubly stochastic models, we seek to develop models with the same structure that can be used to analyse accumulated rainfall sequences at fine time scales.

2.2 Model formulation

The point process model we propose here is constructed from a special class of DSPP where the arrival rate of the point process is governed by a finite-state irreducible Markov chain. See for example, Ramesh (1995, 1998) and Davison and Ramesh (1996). Suppose that the rain cells, at time \(T_i\) say, arrive according to a two-state DSPP where the arrival rate is switching between the high intensity (\(\phi _2\)) and low intensity (\(\phi _1\)) states at random times controlled by the underlying Markov chain. The transition rates of the Markov chain are \(\lambda\) (for \(1 \rightarrow 2\)) and \(\mu\) (for \(2 \rightarrow 1\)) respectively. Therefore, the parameters of the cell arrival process M(t) are specified by the arrival rate matrix \({\varvec{L}}\) of the cell occurrences and the infinitesimal generator \({\varvec{Q}}\) of the underlying Markov chain, where

$${\varvec{L}} =\begin{bmatrix}\phi _{1}&0 \\ 0&\phi _{2} \end{bmatrix}\quad {\text{and}} \quad {\varvec{Q}} =\begin{bmatrix} -\lambda&\lambda \\ \mu&-\mu \end{bmatrix}.$$

Each rain cell generated by the process has a random lifetime of length L which is taken to be exponentially distributed with parameter \(\eta\) and independent of the lifetime of other cells. A cell originating at time \(T_i\) will be active for a period of \(L_i\) and terminates at time \(T_i + L_i\). When the cell at \(T_i\) is active instantaneous random pulses of rainfall occur, during \([T_i, T_i + L_i)\), at times \(T_{ij}\) according to a Poisson process at rate \(\xi\). This process of instantaneous pulse arrival terminates with the cell lifetime. Therefore, each cell of the point process generates a series of pulses during its lifetime, and associated with each pulse is a random rainfall depth, \(X_{ij}\). As a result, the process \(\{T_{ij}, X_{ij}\}\) takes the form of a marked point process Cox and Isham (1980). In our derivation of model properties later on in Sect. 2.3, we treat the pulses in distinct cells as independent but allow those within a single cell to be dependent. We shall refer to this model as a doubly stochastic pulse (DSP) model. A diagram showing the structure of this process is given in Fig. 1. It is clear from this that the formulation of our model is very similar to that of Cowpertwait et al. (2007), but the difference lies in the mechanism for the cell arrivals. Since this process operates in continuous time, we integrate the DSP process to obtain rainfall depths over discrete disjoint time intervals and use their moment properties for model fitting and assessment.

Fig. 1
figure 1

Schematic description of the DSP model: a the cell arrival process based on a two-state DSPP. b The cell lifetimes of the two cells at time \(T_i\) and \(T_{i + 1}\). c The pulse process in the cell that originates at time \(T_i\) and terminates at time \(T_{i}+L_i\)

2.3 Moment properties of the pulse process

It follows from the structure of the point process that the moment properties of the DSP process are functions of those of the cell arrival process. Therefore, we shall first review the properties of the cell arrival process before moving onto derive the properties of the pulse process. The second-order moment properties of the cell arrival process M(t) can be obtained as functions of the model parameters and these are given below (see for example, Ramesh 1998). The stationary distribution \({\varvec{\pi }} =(\pi _{1},\pi _{2})\) of the cell arrival process M(t) is obtained by solving \({\varvec{\pi Q}}= {\mathbf{0}}\), where \({\mathbf{0}}=(0,0),\) and is given by \({\varvec{\pi }} =\left( \mu /(\lambda +\mu ),\lambda /(\lambda +\mu )\right) \!.\) Let \({\mathbf{1}}\) be a column vector of ones, then the mean arrival rate of the cell process is given by

$$\begin{aligned} {\text{E}}(M(t))=m= {\varvec{\pi L 1}} = \frac{\lambda \phi _{2} + \mu \phi _{1}}{\lambda +\mu }. \end{aligned}$$

The covariance density of the cell arrival process M(t) can be obtained as, for \(t>0,\)

$$c^{ M}(t)=\frac{\lambda \mu }{(\lambda +\mu )^2}(\phi _{1}-\phi _{2})^2 e^{-(\lambda +\mu )t} =A e^{-(\lambda +\mu )t}$$
(1)

which shows that its covariance decays exponentially with time.

We shall now study the moment properties of the pulse arrival process and focus our attention on deriving an expression for its covariance density. These moment properties are required to derive the statistical properties of the aggregated rainfall process later in Sect. 2.4. The lifetimes \(L_{i}\) of the rain cells, under the DSP model framework, are assumed to be exponentially distributed with parameter \(\eta\) and hence we have \({\text{E}}(L_{i})= 1/\eta .\) Let us take N(t) as the counting process of pulse occurrences from all cells generated by the process M(t). An active cell generates a series of instantaneous pulses at Poisson rate \(\xi\) during its lifetime and therefore the mean number of pulses per cell is \(\xi /\eta\). As noted earlier, the arrival rate of the cell process is m and hence the mean arrival rate of the pulse process is given by \({\text{E}}(N(t))=m\xi /\eta\).

It is well known that the covariance density of a point process can be expressed as a function of its product density. As shown by Cox and Isham (1980), the product density of the point process N(t) at distinct time points \(t_{1}\),\(t_{2}\),\(\ldots\),\(t_{k}\) for k = 1,2,3,\(\ldots\) can be written as

$$p_{k}(t_{1},\ldots ,t_{k})dt_{1}\ldots dt_{k}={\text{P}}\left\{dN(t_{1})=dN(t_{2})=\cdots =dN(t_{k})=1\right\}.$$

We consider two distinct pulses at time t and \(t+u\) (\(u>0\)) which may come from the same cell or different cells generated by the process M(t). For pulses that come from the same cell, the contribution to the product density \(p_{2}(t,t+u)\) is given by \(p_{2}(t,t+u)={\text{E}}(N(t))\xi e^{-\eta u}=(m\xi ^2/\eta ) e^{-\eta u}\). For two pulses at time t and \(t+u\) that come from different cells, with their cell origins at \(t-v\) and \(t+u-w\) respectively, the contribution to the product density \(p_{2}(t,t+u)\) of the pulse process N(t) is given by

$$p_{2}(t,t+u)=\xi ^2 \int _{v=0}^\infty e^{-\eta v} \int _{w=0}^u e^{-\eta w}\, p_{2}^M(u+v-w) dwdv.$$
(2)

From Eq. (1), the product density of the cell arrival process M(t) becomes

$$p_{2}^M(u+v-w)=c^{M}(u+v-w)+m^2=A e^{-(\lambda +\mu )(u+v-w)} +m^2.$$
(3)

Substituting Eq. (3) in Eq. (2) and completing the integral shows that the contribution to the product density by two pulses that come from different cells is

$$p_{2}(t,t+u)=\left( \frac{m\xi }{\eta }\right) ^2 (1-e^{-\eta u})+\xi ^2 A \frac{\left[ e^{-(\lambda +\mu )u}-e^{-\eta u}\right] }{\eta ^2-(\lambda +\mu )^2}.$$
(4)

The covariance density (Cox and Isham 1980) of the pulse process N(t) for \(u\ge 0\) can be written in terms of its product density as

$$c(u)={\text{Cov}}\{N(t),N(t+u)\} = \left( \frac{m\xi }{\eta }\right) \delta (u) +p_{2}(t,t+u)- \left( \frac{m\xi }{\eta }\right) ^2,$$

where \(\delta (\cdot )\) is the Dirac delta function. Substitution from Eq. (4) and rearranging the terms in the above expression gives, after some algebra, the covariance density of this DSP process N(t) for \(u \ge 0\) as

$$c(u)=\left( \frac{m \xi }{\eta }\right) \delta (u)+ A_{1} e^{-(\lambda +\mu )u}+\left[ B_{2}-B_{1}\right] e^{-\eta u},$$
(5)

where \(A_{1}=\xi ^2 A/(\eta ^2- (\lambda +\mu )^2)\), \(B_{1}=\left( \xi m/\eta \right) ^2+\xi ^2 A/\left( \eta ^2- (\lambda +\mu )^2\right)\) and \(B_{2}=\left( \xi ^2 m/\eta \right)\). In the above expression, \(A_{1}\) and \(B_{1}\) correspond to the contribution from pulses generated by different cells whereas \(B_{2}\) corresponds to the contribution from different pulses within the same cell, where the depths of these pulses may be dependent.

2.4 Moment properties of the aggregated rainfall

In most applications, the rainfall data are usually available in aggregated form in equally spaced time intervals of fixed length. The DSP process we have developed, however, evolves in continuous time. We now, therefore, derive mathematical expressions for the moment properties of the aggregated rainfall arising from the DSP process. These expressions are useful to describe the properties of the accumulated rainfall and can be used for model fitting and assessment.

Let \(Y_{i}^{(h)}\) be the total amount of rainfall in disjoint time intervals of fixed length h, for \(i=1,2,\ldots\). We can express this as

$$Y_{i}^{(h)}=\int _{(i-1)h}^{ih} X(t)dN(t),$$

where X(t) is the depth of a pulse at time t. Without assuming any distribution for the pulse depth, let \({\text{E}}[X(t)]=\mu _{X}\) be the mean depth of the pulses. Then the mean of the aggregated rainfall in the intervals can be written as

$${\text{E}}\left[ Y_{i}^{(h)}\right] =\int _{(i-1)h}^{ih}{\text{E}}( X(t))dN(t)=\left( \frac{m\xi }{\eta }\right) \mu _{X} h.$$
(6)

Using the well known Campbell’s theorem from Daley and Vere-Jones (2007), and utilizing the covariance density of the pulse arrival process given in Eq. (5), we can now work out the variance and autocovariance function of the aggregated rainfall process. Following this, we have

$$\begin{aligned} {\text{Var}}\left[ Y_{i}^{(h)}\right]= & {} \int _0^h \int _0^h {\text{Cov}}\left[ X(s)dN(s),X(t)dN(t)\right] \\= & {} \int _0^h {\text{E}}(X^2)E\left[ dN(t)\right] \\+2 &\int _0^h \int _s^h {\text{E}} \left[ X(s)X(t)\right] {\text{Cov}}\left[ dN(s),dN(t)\right] , \end{aligned}$$

where the double integral is separated into two parts to account for the cases \(s=t\) and \(s\ne t\). In addition, we need to distinguish whether the pulses at times t and s belong to the same cell or come from different cells. For contributions to \({\text{Cov}}\left[ dN(s),dN(t)\right]\), when pulses come from different cells, the multiplier \({\text{E}}\left[ X(s)X(t)\right]\) in the above expression becomes \(\mu _{X}^2\). We shall write this multiplier as \({\text{E}}\left[ X_{ij}X_{ik}\right]\) when pulses come from the same cell. This will allow us to accommodate some within-cell depth dependence. However, it is assumed that any two pulses within a cell, regardless of their location within the cell, have the same expected product of depths. Under this setting, the variance function becomes

$$\begin{aligned} {\text{Var}}\left[ Y_{i}^{(h)}\right]&= {} {\text{E}}(X^2)\left( \frac{m\xi }{\eta }\right) h \nonumber \\&\quad + \, 2 \, \mu _{X}^2 A_{1}\psi _{1}(\lambda +\mu )+2\left[ {\text{E}}\left[ X_{ij}X_{ik}\right] B_{2} -B_{1}\mu _{X}^2\right] \psi _{1}(\eta ), \end{aligned}$$
(7)

where \(\psi _{1}(\lambda +\mu )=\left[ (\lambda +\mu )h-1+e^{-(\lambda +\mu )h}\right] /(\lambda +\mu )^2\) and \(\psi _{1}(\eta )=\left[ \eta h-1+e^{-\eta h}\right] /\eta ^2\).

Following a similar approach we can derive the autocovariance function for the aggregated rainfall in two distinct intervals. Again by distinguishing the contributions from pulses within the same cell, we derived an expression given as, for \(k \ge 1\),

$$\begin{aligned} {\text{Cov}}\left[ Y_{i}^{(h)},Y_{i+k}^{(h)}\right]= & {} \int _{kh}^{(k+1)h} \int _0^h {\text{Cov}}\left[ X(s)dN(s),X(t)dN(t)\right] \nonumber \\= & {} \int _{kh}^{(k+1)h} \int _0^h {\text{E}}\left[ X(s)X(t)\right] {\text{Cov}}\left[ dN(s),dN(t)\right] \nonumber \\= \,& {} \mu _{X}^2 A_{1}\psi _{2}(\lambda +\mu )+\left[ {\text{E}}\left[ X_{ij}X_{ik}\right] B_{2} -B_{1}\mu _{X}^2\right] \psi _{2}(\eta ), \end{aligned}$$
(8)

where \(\psi _{2}(\lambda +\mu )= e^{-(\lambda +\mu )(k-1)h}\left[ 1-e^{-(\lambda +\mu )h} \right] ^2/(\lambda +\mu )^2\) and \(\psi _{2}(\eta )= e^{-\eta (k-1)h}\left[ 1-e^{-\eta h} \right] ^2/\eta ^2.\) When considering the special case where all pulse depths are independent \({\text{E}}\left( X_{ij}X_{ik}\right)\) can be replaced by \(\mu _{X}^2\) in Eqs. (7) and (8).

Although the second-order properties capture the characteristics of the process well in most point process applications, higher moments may provide improved results in terms of reproduction of the properties of interest. To this end, following Cowpertwait et al. (2007), we shall incorporate the third-order moment in our analysis. The derivation of the third moment \({\text{E}}\left[ \left( Y^{(h)}\right) ^3\right]\) follows a similar approach to that of the second moment, but becomes more complex algebraically. Therefore only an outline of the derivation is given along with the final expression in Appendix A. The third moment about the mean \(\mu ^{(h)}_{3}\) is

$$\begin{aligned} \mu ^{(h)}_{3} = & \, {\text{E}}\left[ \left( Y_{i}^{(h)}-{\text{E}}\left[ Y_{i}^{(h)}\right] \right) ^3\right] \\= & \, {\text{E}}\left[ \left[ Y_{i}^{(h)}\right] ^3 \right] -3{\text{E}}\left[ Y_{i}^{(h)}\right] {\text{Var}}\left[ Y_{i}^{(h)}\right] -\left[ {\text{E}}\left[ Y_{i}^{(h)}\right] \right] ^3. \end{aligned}$$

From this the coefficient of skewness of the aggregated rainfall process is shown to be

$$\kappa ^{(h)}=\frac{{\text{E}}\left[ \left( Y_{i}^{(h)}-{\text{E}}\left[ Y_{i}^{(h)}\right] \right) ^3\right] }{ {\text{E}}\left[ \left( Y_{i}^{(h)}-{\text{E}}\left[ Y_{i}^{(h)}\right] \right) ^2\right] ^{3/2}} = \frac{\mu ^{(h)}_{3}}{\left[ {\text{Var}}\left[ Y_{i}^{(h)}\right] \right] ^{3/2}}.$$
(9)

3 Model fitting and assessment

We shall explore the application of the proposed DSP model in the analysis of fine-scale rainfall data and assess how well it reproduces the properties of the rainfall over a range of sub-hourly resolutions. We aim to do this using 69 years (1931–1999) of 5-min rainfall data from Bochum in Germany.

In this analysis, we consider the special case where the pulse depths \(X_{ij}\) are independent random variables that follow an exponential distribution with parameter \(\theta\). Therefore, we have \(\mu _X = {\text{E}}\left[ X(t)\right] =1/\theta\). The data, recorded at the 5-min aggregation level, allow us to fit the model over a range of sub-hourly accumulations. We shall make use of the mathematical expressions for the moment properties of the accumulated rainfall in our model fitting. There are 7 parameters in our model and we estimate 6 of them by the method of moments approach using the observed and theoretical values of these properties. The parameter \(\mu _{X} =1/\theta\) is estimated separately for each month from the sample mean of rainfall depth by using the equation, which follows from Eq. (6),

$$\mu _X=\left( \frac{ \eta }{m \xi }\right) \bar{x },$$
(10)

where \(\bar{x}\) is the estimated average of hourly rainfall for each month.

Although we employed the method of moments to estimate the other parameters, which essentially equates the sample moments to theoretical moments from the model, the estimation can be done in different ways. In this application, we constructed an objective function as the sum of squares of differences between the sample and theoretical values of the moment properties at different aggregation levels and then minimized it using standard optimisation routines. This is carried out separately for each month by considering the data for a month as realisations of a stationary point process. Essentially the process of model fitting involves calculating the empirical mean, variance, correlation, coefficient of variation and skewness from the observed data at each aggregation level and matching these with the corresponding theoretical values, calculated using Eqs. (69), for a given set of parameter values. The role of the objective function and the optimiser employed was to find the best possible match using a minimum error criterion. We used the statistical software environment R for the optimisation and for the simulation of the process (R Development Core Team 2011). A number of options were available for the objective function depending on the application, but we used a form of weighted sum of squares. We employed the routines “optim” and “nlminb” in R for parameter estimation in our analysis. The following subsections describe two different methods used to estimate the first 6 parameters of the model and discuss the results produced. Once these parameter estimates were determined, Eq. (10) was used to estimate the final parameter \(\mu _{X}\).

3.1 Estimation using second-order moments

The first 6 parameters of the model \(\lambda\), \(\mu\), \(\phi _{1}\) , \(\phi _{2 }\), \(\eta\) and \(\xi\) were estimated using the following dimensionless functions; the coefficient of variation \(\nu (h)\) and the autocorrelation \(\rho (h)\) at lag 1 of the aggregated rainfall process. Explicitly, these are

$$\nu (h)=\frac{ {\text{E}}\left[ \left( Y_{i}^{(h)}-{\text{E}}\left[ Y_{i}^{(h)}\right] \right) ^2\right] ^{1/2}}{{\text{E}}(Y_{i}^{(h)})}, \quad \rho (h)={\text{Corr}}\left[ Y_{i}^{(h)},Y_{i+1}^{(h)}\right] .$$
(11)

We need at least 6 sample properties of the aggregated process to fit the model and we employed 8 properties in our estimation. They are \(\nu (h)\) and \(\rho (h)\) at h = 5, 20, 30 and 60 min aggregation levels. The estimates of the functions from the empirical data, denoted by \(\hat{\nu }(h)\) and \(\hat{\rho }(h)\) (for h= 1/12, 1/3, 1/2 and 1 h), were calculated for each month using 69 years of 5-min rainfall series accumulated at appropriate time scales. Parameter estimates can be obtained by using an objective function constructed from the sum of squares of differences between the sample values and their corresponding theoretical values of the proposed model. The estimated values of parameters {\(\hat{\lambda }\), \(\hat{\mu }\), \(\hat{\phi _{1}}\), \(\hat{\phi _{2}}\), \(\hat{\eta }\) and \(\hat{\xi }\)} for each month were obtained by minimising the weighted sum of squares of dimensionless functions, as given below in Eq. (12), using standard optimisation routines:

$$\sum _{h=\frac{1}{12},\frac{1}{3},\frac{1}{2},1} \left[ \frac{1}{{{\text{Var}}({\hat{\nu}} (h))}}\left( \hat{\nu }(h)-{\nu (h)} \right) ^2 + \frac{1}{{{\text{Var}}({\hat{\rho}} (h))}}\left( \hat{\rho }(h)-{\rho (h)} \right) ^2 \right].$$
(12)

In the above expression, \(\nu (h)\) and \(\rho (h)\) are theoretical values as given by Eqs. (611) whereas \(\hat{\nu }(h)\) and \(\hat{\rho }(h)\) are calculated for each month from the 69 years of data. In addition, following Jesus and Chandler (2011), the weights in the objective function were taken as the reciprocal of the variance of the empirical values of the functions calculated separately for each of the 69 years. We also experimented with other objective functions but this was found to give better results. The above function was minimised separately for each month to obtain estimates of the model parameters. We used the simplex algorithm of Nelder and Mead (1965) for the optimisation, since it does not require the calculation of derivatives.

The estimated parameters were then used to calculate the fitted values of the various theoretical properties. The observed and fitted values of the mean, standard deviation, coefficient of variation and lag 1 autocorrelation of the aggregated rainfall are displayed in Figs. 2, 3, 4 and 5. In almost all cases near perfect fits, and in some cases exact, were obtained for all properties with the exception of the lag 1 autocorrelation. The mean gave an excellent fit at all time scales. The standard deviation and coefficient of variation showed near perfect fits for most months, at all time scales, with small deviations from the perfect fit during the summer months. The autocorrelation was reproduced well at smaller aggregation levels, however there appeared to be a slight underestimation at larger aggregations for all months. One point to note here is that h = 10 min aggregation was not used in the fitting but the model has certainly reproduced all the properties well for this time-scale. This reveals that the model is capable of producing estimates of quantities not used in the fitting which adds strength to this DSP modelling framework.

Fig. 2
figure 2

Observed and fitted values of the mean of the aggregated rainfall for the DSP model at h = 5, 10, 20, 30, 60 min aggregations, using second-order moments in estimation

Fig. 3
figure 3

Observed and fitted values of the standard deviation of the aggregated rainfall for the DSP model at h = 5, 10, 20, 30, 60 min aggregations, using second-order moments in estimation

Fig. 4
figure 4

Observed and fitted values of the coefficient of variation of the aggregated rainfall for the DSP model at h = 5, 10, 20, 30, 60 min aggregations, using second-order moments in estimation

Fig. 5
figure 5

Observed and fitted values of the autocorrelation (lag one) of the aggregated rainfall for the DSP model at h = 5, 10, 20, 30 min aggregations, using second-order moments in estimation

3.2 Estimation incorporating third-order moments

We now analyse results which incorporated third-order moments in the fitting, since this has been found useful in modelling rainfall (Cowpertwait et al. 2007). With the coefficient of skewness \(\kappa ^{(h)}\) given in Eq. (9), we now have three model functions to utilise in estimation. Under the model framework, we have six parameters to estimate using the moment method and we consider nine properties to include the coefficient of skewness. These are the coefficient of variation \(\nu (h)\), the autocorrelation \(\rho (h)\) at lag 1 and the coefficient of skewness \(\kappa (h)\), all at three different aggregation levels of \(h=5\), 10 and 20 min. The following objective function, which incorporates third-order moments in parameter estimation, was used in our analysis:

$$\sum _{h=\frac{1}{12},\frac{1}{6},\frac{1}{3}} \left[ \frac{1}{{{\text{Var}}({\hat{\nu}} (h))}}\left( \hat{\nu }(h)-{\nu (h)} \right) ^2 + \frac{1}{{{\text{Var}}({\hat{\rho}} (h))}}\left( \hat{\rho }(h)-{\rho (h)} \right) ^2 + \frac{1}{{{\text{Var}}({\hat{\kappa}} (h))}}\left( {\hat{\kappa}}(h)-{\kappa (h)} \right) ^2 \right].$$

This objective function was minimised, separately for each month, to obtain estimates of the model parameters. These are given in Table 1. Estimated parameters showed some variation across months. Values of \(\hat{\mu }\) were larger for summer months showing smaller mean sojourn times (\(1/\mu\)) in the higher rainfall intensity state. The estimates \(\hat{\phi }_2\) were also higher, in general, for the summer months and showed that the cell arrival rates in state 2 vary from about 2.4–7.8 cells per hour. The pulse arrival rate \(\hat{\xi }\) also showed variation across months from about 223–290 pulses per hour throughout the year. The mean duration of cell lifetime (\(1/\eta\)) fell between 8 and 24 min. It was noticeable that the cell durations were shorter in summer months (8–10 min) when compared with other months. This is consistent with the nature of the summer rainfall, as they are mostly generated from thunder storms of high intensity and shorter duration. The estimates also showed that the mean depth of the pulses (\(\mu _x\)) tend to be larger in summer months with the highest value in July.

Table 1 Parameter estimates for the DSP model incorporating third-order moments in estimation

Figures 6, 7, 8 and 9 show the corresponding results when the third-order moments are incorporated into the parameter estimation process. These clearly show an improvement over the earlier results of the method that used the moments up to the second-order. Although the same model is used here, the parameter estimates are obviously different from the earlier values due to the fact that the coefficient of skewness was used in estimation for this second method. In most cases a near perfect fit was obtained and in some cases an exact fit was obtained when the third-order moments were incorporated. The mean, standard deviation and coefficient of variation have all been reproduced remarkably well at all aggregation levels, including those that were not used in fitting (h = 30, h = 60 min). Comparing these with earlier results reveals that incorporating third-order moments in estimation has certainly produced much better results for most of the properties.

The mean appears to be estimated equally well by both methods. The standard deviation and coefficient of variation were clearly in better agreement with their empirical counterparts at all aggregations when third-order moments are used. This is visible in the plots, particularly for the summer months where the first method consistently showed a slight underestimation. The autocorrelation at lag one, however, was an exception. Although it was estimated well at small aggregation levels, the fit got worse at higher levels of aggregation, especially at those that were not used in fitting, showing consistent underestimation. Nevertheless, the observed and fitted values of the coefficient of skewness were in very good agreement in Fig. 10. Here again the results showed a near perfect fit at all aggregation levels. In addition to this, as we will see in Sect. 3.3, incorporation of third-order moments in estimation reproduced the extremes better than the earlier method.

Fig. 6
figure 6

Observed and fitted values of the mean of the aggregated rainfall for the DSP model at \(h = 5\), 10, 20, 30, 60 min aggregations, using third-order moments in estimation

Fig. 7
figure 7

Observed and fitted values of the standard deviation of the aggregated rainfall for the DSP model at \(h = 5\), 10, 20, 30, 60 min aggregations, using third-order moments in estimation

Fig. 8
figure 8

Observed and fitted values of the coefficient of variation of the aggregated rainfall for the DSP model at \(h = 5\), 10, 20, 30, 60 min aggregations, using third-order moments in estimation

Fig. 9
figure 9

Observed and fitted values of the autocorrelation (lag one) of the aggregated rainfall for the DSP model at \(h = 5\), 10, 20, 30 min aggregations, using third-order moments in estimation

Fig. 10
figure 10

Observed and fitted values of the coefficient of skewness of the aggregated rainfall for the DSP model at \(h=5\), 10, 20, 30, 60 min aggregations, using third-order moments in estimation

3.3 Extremes and proportion of dry periods

In many hydrological applications, more emphasis is placed upon a stochastic model’s ability to reproduce the properties of extreme rainfall rather than the usual moment properties. One good example arises in urban drainage modelling within the context of flood estimation. One can find many other examples of this in analyses involving environmental data, see for example Ramesh and Davison (2002), Davison and Ramesh (2000) and Leiva et al. (2016). In addition, knowledge of the extremes enables us to assess the risk associated with hydrological systems. In view of this, we shall evaluate the performance of our proposed model in capturing the properties of extreme rainfall. In this regard, we compare the extreme values of the 69 years of observed rainfall data with those generated by the proposed DSP model.

The annual maxima of the empirical data from the observed 69 year long historical record were extracted, ordered and plotted against the corresponding Gumbel reduced variates at each aggregation level. Fifty copies of the 5-min rainfall series, each 69 years long, were then simulated from the fitted model. Each copy of the simulated data was subsequently aggregated to generate 1, 12 and 24 h rainfall series. The annual maxima of each of the 50 simulated series, at each aggregation level, were extracted and ordered to make up the interval plots against the corresponding Gumbel reduced variates. These were superimposed on the corresponding Gumbel reduced variate plots of the empirical data for comparison.

Figure 11 shows the results for h = 5 min (top panel) and h = 60 min (bottom panel) aggregations whereas the results of h = 12, 24 h are displayed in Fig. 12. The red solid circles show the mean of the maxima from the 50 simulations and the blue squares connected by the solid line show the empirical annual maximum values. At h = 5 min aggregation level the model vastly underestimates the extremes. When h = 60 min there was evidence of over-estimation at the lower end and underestimation at the upper end of the reduced variates. Nevertheless, there was evidence of substantial improvement at larger values of h. Clearly there was very good agreement at h = 12, 24 h aggregations, as all of the empirical values fell within the range of the simulated values. Furthermore, the mean of the simulated annual maxima fell reasonably close to the empirical annual maxima, at h = 12 and 24, for much of the period. It is encouraging to note that this was the case for the values corresponding to the return periods from about 5–100 years. Hence, the proposed model, although underestimating the extremes at sub-hourly aggregation, appears to capture the extremes well at larger aggregations. The estimation of extreme values at smaller time-scales is a common problem for most stochastic models for rainfall, and our results concur with the findings of previous published studies, see for example Cowpertwait et al. (2007) or Verhoest et al. (1997).

Fig. 11
figure 11

Ordered annual maxima of the 5-min (top panel) and 1-h (bottom panel) aggregated rainfall plotted against the reduced Gumbel variate. Empirical annual maximum values are shown as blue squares connected by a solid line. Interval plots based on annual maxima of 50 simulations, each 69 years long, are also shown. The red solid circles are the mean of the 50 simulated maxima

Fig. 12
figure 12

Ordered annual maxima of the 12-h (top panel) and 24-h (bottom panel) aggregated rainfall plotted against the reduced Gumbel variate. Empirical annual maximum values are shown as blue squares connected by a solid line. Interval plots based on annual maxima of 50 simulations, each 69 years long, are also shown. The red solid circles are the mean of the 50 simulated maxima

In order to asses the performance of the two estimation methods in reproducing extremes, we compared the ordered empirical annual maxima with those generated by the parameter estimates of the two methods. Table 2 shows the means of the ordered annual maxima from 50 simulations for the two methods, together with the ordered empirical annual maxima, at 12- and 24-h aggregation levels. These are compared for a range of reduced Gumbel variates covering the values that correspond to the return period from 5 years to 100 years. Note that the interval plots based on the simulations, using the method which incorporates third-order moments in estimation (Method 2), are shown in Fig. 12 for the whole period. Results presented in Table 2 show that Method 2 performed much better than the method that used second-order moments in estimation (Method 1), at both aggregation levels. The above result and Fig. 12 clearly show that Method 2 outperformed Method 1 and reproduced extremes well at higher aggregations.

Table 2 Ordered empirical annual maxima for the 12- and 24-h aggregated rainfall and the corresponding ordered reduced Gumbel variates

Another property of interest to hydrologists is the proportion of intervals with little or no rain. This will help to quantify the proportion of dry periods. Very often the gauge rainfall data are recorded in a rounded form (to the nearest 0.1 mm) and hence, following Cowpertwait et al. (2007), we calculate the proportion of rainfall below a small threshold \(\left( \hat{p} \{Y_i^{h}<\delta \} {\text{ for }} \delta >0 \right)\) instead of the actual proportion of dry intervals (\(\delta =0\)). Therefore by choosing smaller values of \(\delta\) we can provide approximate estimates of the proportion of dry intervals.

The proportion of intervals below a given threshold were calculated for each month, using 50 samples of 69 years of simulated 5-min data, for each of the 5-min, 1 and 24 h aggregations (\(h=1/12, 1, 24\)). The mean of the 50 values was calculated, at each aggregation, for each calendar month. The observed proportions for the historical data and the average of the simulated values, from the fitted model, for different thresholds are given in Table 3. In order to find a good estimate of the proportion of dry periods at \(h=1/12\) aggregation level, a small threshold of \(\delta =0.05\) mm was used. For the hourly rainfall, two threshold values of \(\delta =0.05, 0.1\) mm were used. Higher threshold values (\(\delta =0.5, 2\) mm) were used for the daily rainfall, as occasional light rain during the day may cause discrepancies between the observed and simulated proportions.

The numerical results presented in Table 3 show that, in general, the model reproduces the observed proportions well at 5-min aggregation, although it tends to over-estimate slightly. However, the differences we observed are between 0.01 for February and 0.022 for June. At the hourly aggregation level, the model tended to consistently underestimate the proportions each month when \(\delta =0.05\) mm. However, when the threshold was increased to 0.1 mm the observed and simulated proportions were in good agreement, except for October, with the summer month June again showing the largest difference of 0.027. Finally, at a daily level, there was consistent over-estimation when the threshold was set at 0.5 mm. When the threshold increased to 2 mm the observed and simulated proportions became very close. This indicates that the frequency of very light rain events, over a period of a day, was greater in the observed data. The differences tended to be larger in the summer months and smaller in the winter months, with June once again showing the highest difference of 0.026 and February showing the smallest difference of 0.012.

Table 3 Proportion of rainfall below defined thresholds at different time scales. Here \(\hat{p}_O\) and \(\hat{p}_S\) represent the estimates of the proportions for the observed and simulated series

3.4 Simulation study

Since the likelihood function of the model we proposed for the accumulated rainfall data was not available, we employed the moment method to estimate the parameters in our analysis. The method of moments simply equates sample moments from the observed data to the theoretical moments of the model being fitted to obtain estimates of the parameters. It is important to note that moment estimators, unlike maximum likelihood estimators, do not necessarily have the usual large sample properties leading to asymptotic results. Nevertheless, we carried out a simulation study to evaluate the statistical performance of the estimation method we employed in our analysis.

The observed data was a 69-year long accumulated rainfall series in 5-min intervals and our stochastic point process model was fitted separately for each month. A month with 31 days had 616,032 observations, although most of them were zero, especially for a summer month. The simulation study was carried out for a typical summer month, August, with simulated rainfall data over a period of length \(n_l=20, 40, 60, 69\) years, as described in the algorithm given below.

  • Simulate one hundred (\(n = 100\)) sample series from the fitted values of the parameters (\(\theta\)) for the month.

  • Aggregate the simulated data in \(h = 5, 10, 20\) min intervals.

  • Calculate their sample statistics: coefficient of variation \(\nu (h)\), lag 1 autocorrelation \(\rho (h)\) and coefficient of skewness \(\kappa (h)\) for each value of h.

  • Compute the moment estimates (\(\hat{\theta }\)) for the simulated samples, using the objective function used in Sect. 3.2.

  • Compute the mean, bias and mean squared error of the moment estimates, separately for each of the parameters, using the expressions

    $${\text{Mean}}=\frac{1}{n}\sum _{i=1}^{n} \hat{\theta }, \quad {\text{Bias}}=\frac{1}{n}\sum _{i=1}^{n} (\hat{\theta } -\theta ), \quad{\text{MSE}}=\frac{1}{n}\sum _{i=1}^{n} (\hat{\theta } -\theta )^2,$$

    where n = 100 and \(\theta = \lambda\), \(\mu\), \(\phi _{1}\), \(\phi _{2 }\), \(\eta\) and \(\xi\) are the fitted parameter values of the empirical rainfall data for the month under study and \(\hat{\theta }\) is the corresponding estimate for the simulated data.

The above steps were repeated with \(n_l = 20, 40, 60, 69\) years of data to produce a table of average, bias, root mean squared error (\(\sqrt{\text {MSE}}\)) for each of the 6 parameters. The results are presented in Table 4 where the fitted parameter values for August are noted as the true values used in the simulation.

Table 4 Simulation study results: values of the mean, bias and root mean squared error (\(\sqrt{\text {MSE}}\)) for the moment estimates of the parameters based on 100 simulated series of length \(n_l\) years for the month of August. True values are the fitted parameter values for the observed rainfall data for August

The means of the simulated estimates showed that these are generally close to the corresponding true values. The means of \(\hat{\mu }\) and \(\hat{\xi }\) were converging towards their true values as \(n_l\) became large, whereas the means of \(\hat{\phi }_2\) and \(\hat{\eta }\) did not seem to exhibit this property. It is worth pointing out that the parameters \(\phi _2\) and \(\eta\) are positively correlated and when \(\phi _2\) increased \(\eta\) also increased. The reason for this is that much of the rain cells come from the high intensity state 2 and, as we are equating the rainfall moments in our estimation, increased arrival rate \(\phi _2\) of rain cells results in smaller cell lifetimes of \(1/\eta\), and vice-versa. Means of \(\hat{\lambda }\) and \(\hat{\phi }_1\) do not seem to show much variation.

The bias of \(\hat{\mu }\) and \(\hat{\xi }\) became smaller when \(n_l\) became larger but both \(\hat{\phi }_2\) and \(\hat{\eta }\) seemed to show a slight negative bias. As we can see from the values, \(\hat{\lambda }\) and \(\hat{\phi }_1\) both show very small negative bias and again do not show much variation with \(n_l\). The root mean squared errors seemed to stay more or less at the same level for most of the parameters, as \(n_l\) increased, except for \(\hat{\phi }_2\) and \(\hat{\eta }\) which showed a slight increase. One possible reason for that might be that the moment estimates may be slightly biased because of the serial correlation in the data, as noted by Cowpertwait et al. (2007), especially at sub-hourly aggregation levels which show high autocorrelation. Ideally we would have liked the bias and MSE of the estimators to converge to zero asymptotically for all parameters, as would those of the maximum likelihood estimators which have good large sample statistical properties. However, the means of the simulated estimates were closer to their true values and the relative sizes of the bias and root mean squared errors seemed to be reasonably small for the application.

4 Conclusions and future work

We have developed a doubly stochastic point process model, using instantaneous pulses, to study the fine-scale structure of sub-hourly rainfall time series. Second and third-order moment properties of the aggregated rainfall for the proposed DSP model were derived. The model was used to analyse 69 years of 5-min rainfall data. The empirical properties of the rainfall accumulations were shown to be in very good agreement with the fitted theoretical values over a range of sub-hourly time scales, including those that were not used in fitting. Although the use of second-order moments in estimation produced very good results, incorporation of third-order moments showed a clear improvement in fitting. Overall, the results of our analysis suggest that the proposed stochastic model is capable of reproducing the fine-scale structure of the rainfall process, and hence could be a useful tool in environmental or ecological impact studies.

The simulated extreme values at daily and 12-hourly aggregations are in very good agreement with their empirical counterparts. However, although the model reproduces the moment properties well, it underestimates the extremes at fine time-scale. The results from the analysis of the proportion of dry periods, using intervals with rainfall depths below appropriate threshold levels, show that the model generally reproduces the observed proportions well. The simulation study shows that the estimation method used is capable of reproducing the estimates closer to the true values, although it may not have the desired large sample properties which maximum likelihood estimators exhibit. Overall, the above analyses indicate that the proposed modelling approach is able to fit data over a range of sub-hourly time scales and reproduce most of the properties well. It has potential application in many areas, as it provides a fast and efficient way of generating synthetic fine-scale rainfall input to hydrological models directly from one stochastic model.

Despite this, there is potential to develop the model further to employ a 3-state doubly stochastic model for cell arrivals and also to explore its capability to handle aggregations at higher levels. The 3-state model, along with further developments to study other hydrological properties of interest, may be a fruitful area for future work. Although our model based on one doubly stochastic process has performed well, another possibility is to consider superposing two doubly stochastic pulse processes, as in Cowpertwait et al. (2007), to better account for different types of precipitation, such as convective and stratiform.