1 Introduction

The patterns of event duration times are of primary interest in many research studies. A close follow-up of each study individual who potentially experiences the event of interest is commonly unrealistic. Although the occurrence of the event may be reported, its start time is often unavailable, especially with so-called “silent” event occurrence (e.g. Balasubramanian and Lagakos 2003).

For example, it is important for the prediction of future fire growth and the allocation of suppression resources to understand the distribution of the duration from the fire start time to the time when the work of suppressing the fire begins, i.e. the initial attack time. A wildland fire is usually reported to the fire management agency by look-out towers or people in the area, and the fire manager then dispatches fire-fighting resources (e.g. Martell 2007; Morin 2014). The exact time when the fire starts is often unknown; instead, recorded is the time when the fire is reported. As another example, many HIV/AIDS studies are concerned with the duration of HIV infection from the time of infection to the onset of an AIDS event; see, for example, Degruttola et al. (1991) and Doksum and Normand (1995). Often the infection is detected at a time considerably later after it takes place and thus the exact HIV infection time is usually unavailable. Time to COVID–19 infection is a more recent example of this phenomenon.

A practical approach to handling data with missing time origins ignores the reporting delay and performs inference on the duration distribution using the observed portion of the event duration, that is, the duration between the report time and event termination. This naive approach yields apparently biased inference when the time gap is nonignorable between the event onset to when it is reported. Another commonly used approach to handling times with missing time origin is to view the observation on the time origin subject to interval-censoring. Thus the lower limit of the interval is the length of time that the event has been observed (which we refer to as \(L^*\)), and the upper limit is the sum of the observed duration and the longest possible reporting delay (which we refer to as \(R_{max}\)). Turnbull’s nonparametric maximum likelihood estimator (NPMLE; Turnbull 1976) can then be employed to estimate the distribution of the actual duration (which we refer to as L) using such manufactured interval-censored data. The resulting inference can be unsatisfactory, especially when the longest expected reporting delay is large relative to the observed portion of the event duration. In addition, the interval-censoring is likely informative in many situations, which invalidates Turnbull’s estimator. For example, fires can occur at varying distances from fire management resources. It results in a reporting delay S and an observed duration \(L^*\) varying together, and thus \(L^*\) and S may not be independent. That is, the interval \([L^*, L^*+R_{max}]\) provides information on L additional to the fact of \(L \in [L^*, L^*+R_{max}]\). This paper considers estimation of the event duration distribution with the aforementioned type of event time data under a first-hitting-time model using the event associated longitudinal measures.

Many studies have readily available longitudinal measures associated with the event of interest. In reliability, for example, Lu and Meeker (1993) use degradation data to estimate the distribution of a failure time, taking the failure time to be the time when the degradation path hits a critical level. The concept of first-hitting-time has been widely applied. Various models have been used to formulate longitudinal measures, such as a Gamma process (e.g. Lawless and Crowder 2004), a Wiener process (e.g. Doksum and Hoyland 1992; Horrocks and Thompson 2004; Lee and Whitmore 2006; Pennell et al. 2009; Choi et al. 2014; Mulatya et al. 2016), and an inverse Gaussian process (e.g. Peng 2015).

We aim to make inference on the population of all reported fires. Most of the first-hitting-time models formulate the event time via a hypothetical/underlying process. Our modelling is similar to the excellent exception presented in Mulatya et al. (2016). Using Brownian motion with random drift, we link the readily available longitudinal measures to the recorded times to locate the missing time origin. The issue of dependence between the reporting delay and the observed portion of duration is handled by conditioning on the random drift. We adapt the empirical distribution function, which requires independent and identically distributed (iid) observations, into an intuitive and easy-to-implement estimator for the distribution based on the event duration with missing time origin. Conventional smoothing techniques, such as kernel methods, can be straightforwardly applied to smooth the proposed estimator. A collection of wildland fire records from Alberta, Canada is used to motivate and present the proposed approach. However, potential applications of our approach are broad and not limited to wildland fire management studies.

The rest of the paper is organized as follows. Section 2 introduces a model for longitudinal measures of fire burnt area over time and proposes an estimator for the duration distribution aided by the longitudinal model using duration times with missing origins. It is straightforward to evaluate smoothed versions of the proposed estimator. We present procedures for estimating the parameters that are involved in the longitudinal model and required by the estimator for the duration distribution. We then derive the asymptotic properties of the distribution estimator and its variance estimation. Section 3 reports an analysis of wildland fire records with the proposed approach and Sect. 4 presents simulation studies conducted to examine finite-sample performance of the proposed estimator regarding its consistency, efficiency, and robustness. We also compare the performance of the proposed approach with that of the naive approach and of the Turnbull estimator. Some final remarks are given in Sect. 5.

2 Estimation of duration distribution in the presence of missing time origin

2.1 Notation and model

We formulate the aforementioned statistical problem in terms of wildland fire management. Following Parks (1964), Fig. 1 provides a description for the development of a hypothetical wildland fire via its progression of burnt area over time. The solid curve in the figure represents the burnt area overtime of a fire that is subject to suppression after detection. The time point when suppression begins is referred to as the time of initial attack. The dashed curve shows the expected trajectory of the fire’s burnt area if it had continued to burn without any suppression or intervention. After ignition, the burnt area grows nonlinearly in time, and can be well approximated as exponential initially. Prior to initial attack, the dashed curve and the solid curve coincide. The times \(T_{S}, T_{R}\), and \(T_{F}\) in Fig. 1 are the calendar times when a fire starts, is reported, and initial attack begins, respectively.

Fig. 1
figure 1

Hypothetical description of a wildfire progression through fire management phases following Parks (1964). The solid curve represents the burnt area of a fire that is reported and then dispatched with suppression resources. The dashed curve represents the burnt area of the fire that had continued to burn without any suppression

The duration time of interest, denoted by L, is the elapsed time from the fire’s start to the time of initial attack, i.e., \(L=T_{F}-T_{S}\). We are concerned with situations where the true fire start time \(T_{S}\) is unavailable, and thus the time origin of the event duration is missing. Denote the unobservable reporting delay by \(S=T_R-T_S\). The observed portion of the duration is \(L^*=T_F-T_R=L-S\), the period between report time and initial attack time. Let A(u) be the burnt area at time u, where we assume there is no burnt area at the start time, i.e. \(A(u)=0\) when \(u=0\). Let \(B=A(S)\) and \(D=A(L)-A(S)\) denote the burnt area at the report time and the increase in area by the initial attack time, respectively.

Consider a collection of n independent wildland fires. We assume that the natural logarithm of fire i’s burnt area is \(A_i(u)=g_i(u)+\sigma _i W_i(u)\) for \(i=1,\cdots , n\), where \(g_i(u)\) is a nondecreasing function with \(g_i(0)=0\) and \(W_i(\cdot )\) is the standard Wiener process. As a fire usually grows unhindered until initial attack, we suppose \(\sigma _i\equiv \sigma \) and use a linear approximation to \(g_i(u)\) with random drift \(\nu _i=\nu e^{\delta _i}\), where the constant \(\nu \) is positive, and \(\delta _i\) is independent of \(W_i(\cdot )\) and following a distribution \(\phi (\cdot ;\sigma _r)\) with \(E[\delta _i]=0\) and \(Var[\delta _i]=\sigma ^2_r\). This results in the model considered in this paper:

$$\begin{aligned} A_{i}(u)=\nu _i u+\sigma W_{i}(u), ~~ u\ge 0. \end{aligned}$$
(1)

The random drift \(\nu _i\) characterizes the heterogeneity in fire growth among the individual fires. Note that \(\nu _i\) reduces to a constant drift when \(\sigma _r=0\). In the rest of this paper, we assume \(\delta _i\sim N(0,\sigma ^2_r)\) with \(\sigma _r\ge 0\).

Under the Wiener process model for burnt-area (1), the reporting delay \(S_i\) (report time since the fire starts) can be viewed as a first-hitting-time: it is the time when fire i’s burnt area first reaches the threshold \(B_i\), the burnt area at the report time: \(S_{i} =\sup \{u: u>0, A_{i}(u)< B_{i}\}\), which is the same as \(\inf \{u: u>0, A_{i}(u)> B_{i}\}\) almost surely. According to Chhikara and Folks (1989), the first-hitting-time \(S_{i}\) follows the inverse Gaussian distribution given threshold \(B_i\), with the cumulative distribution function (CDF)

$$\begin{aligned} G(x|B_i,\nu _i,\sigma )= & {} \varPhi \left( \sqrt{\frac{B^2_i}{\sigma ^2 x}}\left[ \frac{x\nu _i}{B_i}-1\right] \right) \nonumber \\&+\exp \left( 2B_i\nu _i/\sigma ^2\right) \varPhi \left( -\sqrt{\frac{B^2_i}{\sigma ^2x}} \left[ \frac{x\nu _i}{B_i}+1\right] \right) , \end{aligned}$$
(2)

where \(\varPhi (\cdot )\) is the CDF of the standard normal distribution. Denote the observed data by

$$\begin{aligned} \text{ Observed-data }=\varvec{\bigcup }_{i=1}^{n}{\mathbf {O}}_i=\big \{(T_{Ri},T_{Fi}, B_i, D_i):i=1,2,\cdots ,n\big \}. \end{aligned}$$
(3)

This paper focuses on estimation of \(F(\cdot )\), the CDF of the event duration using the observed data (3) with the following assumptions.

  • Assumption (A1). \(\{(T_{Ri},T_{Fi}, B_i, D_i,L_i):i=1,2,\cdots ,n\}\) is a collection of iid realizations of \(\big (T_{R},T_{F}, B, D,L\big )\), where \(L\sim F(\cdot )\).

  • Assumption (A2). \(L_i^*=T_{Fi}-T_{Ri}=L_i-S_i\) and \(S_i=T_{Ri}-T_{Si}\) are independent conditional on \(\nu _i\) for \(i=1,\ldots ,n\).

The assumptions may hold to a reasonable level of approximation in many practical situations. In our wildland fire application, the assumption A2 assumes that the reporting delay (\(S_i\)) of fire i and the time to the initial attack since being reported (\(L^*_i\)) are independent conditional on the fire spread rate \(\nu _i\). This is plausible since the fire agency often assesses a reported fire regarding its spread rate, and then arranges for the initial attack accordingly. That is, \(L^*_i\) depends likely on \(S_i\) solely through \(\nu _i\).

2.2 Procedures for estimating \(F(\cdot )\)

2.2.1 Review of the existing approaches

If all duration times \(L_i\) for \(i=1,\ldots , n\) were observed, the empirical distribution function, the nonparametric maximum likelihood estimator (NPMLE) based on the iid observations, could be applied to estimate the duration distribution: \({F}_n(t)=\sum _{i=1}^{n}I(L_i\le t)\big /n\), where \(I({\mathscr {E}})\) is the indicator function of event \({\mathscr {E}}\). Since a fire is usually reported after a delay, only \(L^*_i\), a portion of \(L_i=S_i+L^*_i\), is recorded. The aforementioned naive estimator is \(F^*_n(t)=\sum _{i=1}^{n}I(L^*_i\le t)\big /n\). It is clearly biased when \(P(S_i>0)\ne 0\).

Observe that \(L_i=L^*_i+S_i \in [L^{*}_{i}, L^{*}_{i}+R_{max}]\) with \(R_{max}\) the longest possible reporting delay as discussed in the existing literature. The current observations might then be cast as interval-censored event times. Turnbull’s self-consistent estimator (Turnbull 1976) can then be used to estimate the distribution with the interval-censored observations. Let \(0=\tau _0<\tau _1<\cdots <\tau _Q\) be the ordered unique values of \(\big \{\{L^*_{i}\}_{i=1}^{n},\{L^{*}_{i}+R_{max}\}_{i=1}^{n}\big \}\), and define \(\alpha _{iq}=I\big \{(\tau _{q-1},\tau _{q}]\subseteq (L^*_{i},L^{*}_{i}+R_{max}]\big \}\) and \(p_q=F(\tau _{q})-F(\tau _{q-1})\). Following Klein and Moeschberger (2003), the Turnbull estimator is the solution to the equations

$$\begin{aligned} p_q= & {} \frac{1}{n}\sum _{i=1}^{n}\text{ E }\Big \{I(L_i\in (\tau _{q-1},\tau _q] \Big |L_{i} \in (L^{*}_{i}, L^{*}_{i}+R_{max}]\Big \}\nonumber \\= & {} \frac{1}{n}\sum _{i=1}^{n}\frac{\alpha _{iq}{p}_q}{\sum _{k=1}^{Q}\alpha _{ik}{p}_k} \end{aligned}$$
(4)

for \(q=1,\ldots , Q\). However, the Turnbull estimator may not perform very well in the situations of particular interest in this paper. Note that the Turnbull estimator is not uniquely defined over the whole positive real line but up to an equivalence class of distributions that may only differ over gaps, i.e. the innermost intervals. Since \(R_{max}\) is large relative to \(L_i^*\) in our application, the data form relatively small number of innermost intervals and thus often give a quite noninformative estimate. Moreover, the mechanism of interval-censoring in wildland fire studies may be informative since the observed \(L_i^{*}\) is often dependent on the reporting delay \(S_{i}\) through the fire spread rate \(\nu _i\). Those considerations motivate us to propose an alternative estimator for the duration distribution \(F(\cdot )\) using available observations on the burnt-area process, which contain information related to the reporting delay.

2.2.2 Proposed estimator of \(F(\cdot )\)

By Model (1) and Assumption (A2), note that \(\text{ E }\big \{I(L_i\le t)|\text{ Observed-data }\big \} =\text{ P }\big (S_{i}\le t-L^*_i|{\mathbf {O}}_i\big )\) can be expressed as \( \int _{-\infty }^{\infty } G(t-L^*_i|B_i,\nu e^{\delta },\sigma ) {\phi }(\delta |{\mathbf {O}}_i;\nu ,\sigma , \sigma _r) \mathrm {d}\delta ,\) where \({\phi }(\cdot |{\mathbf {O}}_i;\nu ,\sigma ,\sigma _r)\) is the conditional distribution of \(\delta _i\) given the observed data associated with fire i as specified in (3). The consideration above suggests the following estimator, provided that the parameters \(\nu ,\sigma , \sigma _r\) are known:

$$\begin{aligned} {\tilde{F}}_{n}(t;\nu ,\sigma ,\sigma _r) =\frac{1}{n}\sum _{i=1}^{n}\int _{-\infty }^{\infty } G(t-L^*_i;B_i,\nu e^{\delta },\sigma ) {\phi }(\delta |{\mathbf {O}}_i;\nu ,\sigma , \sigma _{r}) \mathrm {d}\delta . \end{aligned}$$
(5)

We propose to replace parameters in (5) with their consistent estimators based on the available data. This results in a feasible distribution estimator,

$$\begin{aligned} {\tilde{F}}_{n}(t;{\hat{\nu }}, {\hat{\sigma }},{\hat{\sigma }}_r) =\frac{1}{n}\sum _{i=1}^{n} \int _{-\infty }^{\infty } {G(t-L^{*}_{i}|B_i,{\hat{\nu }}e^{\delta },{\hat{\sigma }})}{ \phi }(\delta |{\mathbf {O}}_i;{\hat{\nu }},{\hat{\sigma }},{\hat{\sigma }}_{r}) \mathrm {d}\delta , \end{aligned}$$
(6)

abbreviated by \({\hat{F}}_n(t)\) in the rest of this paper. In Sect. 2.3, we present procedures for consistently estimating parameters \(\nu ,\sigma , \sigma _r\). To compute (6) numerically, one may approximate \({\hat{F}}_n(t)\) with

$$\begin{aligned} \frac{1}{nJ}\sum _{j=1}^{J}\sum _{i=1}^{n}G(t-L^{*}_{i}| B_i,{\hat{\nu }}e^{\delta _i^{(j)}},{\hat{\sigma }}), \end{aligned}$$
(7)

where \(\delta _i^{(1)}, \cdots , \delta _i^{(J)}\) are sampled independently from the estimated conditional distribution \({\phi }(\cdot |{\mathbf {O}}_i;{\hat{\nu }},{\hat{\sigma }},{\hat{\sigma }}_{r})\) for \(i=1,\ldots ,n\).

The proposed estimator in (5) is adapted from the empirical distribution function. Analogously, we can obtain a smoothed distribution estimator of \(F(\cdot )\) by adapting the kernel distribution estimator with all the duration observed. Recall that a kernel estimator of \(F(\cdot )\) with iid observed duration is \(F_{n,h}(t)=\sum _{i=1}^{n}K(\frac{t-L_i}{h})\big /n\), where \(K(t)=\int _{-\infty }^{t}k(u)\mathrm {d}u\) with \(k(\cdot )\) a kernel function and h being the bandwidth (e.g., Rosenblatt 1956). Its projection onto the available data space, \(\sum _{i=1}^{n}\text{ E }\big \{K\big (\frac{t-(L^*_i+S_{i})}{h}\big ) \big |{\mathbf {O}}_i\big \}\big /n\), yields the following estimator with smooth realizations, denoted by \({\hat{F}}_{n,h}(t)\):

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\int _{-\infty }^{\infty } \int _{0}^{\infty } K\left( \frac{t-(L^*_i+s)}{h}\right) {\phi }(\delta |{\mathbf {O}}_i;{\hat{\nu }},{\hat{\sigma }}, {\hat{\sigma }}_{r})\mathrm {d}G(s|B_i,{\hat{\nu }}e^{\delta },{\hat{\sigma }})\mathrm {d}\delta . \end{aligned}$$
(8)

When one deals with the situation where no random effect is involved and \(S_i\) is assumed to be uniformly distributed over \([0,R_{max}]\), the estimator \({\hat{F}}_{n,h}(t)\) reduces to the one discussed in Braun et al. (2005). Since the choice of bandwidth h is still under investigation, we focus on the estimator \({\hat{F}}_{n}(t)\) given in (6) for the rest of the paper.

2.3 Procedures for estimating parameters in Model (1): \(\varvec{\theta }=(\nu ,\sigma ,\sigma _r)\)

The log-likelihood function based on the available data is

$$\begin{aligned} \log L_{obs}(\varvec{\theta }\big |\text{ Observed-data}) = \sum _{i=1}^n \log L_{obs}(\varvec{\theta }; {\mathbf {O}}_i), \end{aligned}$$
(9)

where the contribution from fire i \(\log L_{obs}(\varvec{\theta }; {\mathbf {O}}_i)\) is \(\log \int _{0}^{\infty }\int _{-\infty }^{\infty } \big \{L_{obs,i|S,\delta }\big \}\mathrm {d}[S,\delta ]\) with \(L_{obs,i|S_i,\delta _i}= [D_i|L^*_i,\delta _i][B_i|S_{i},\delta _i]\). Here \([D_i|L^*_i,\delta _i]\) and \([B_i|S_{i},\delta _i]\) are the conditional distribution of \(D_{i}\) given \(L^*_i,\delta _i\) and the conditional distribution of \(B_{i}\) given \(S_{i}\) and \(\delta _i\), respectively. Under Model (1), \([D_i|L^*_i,\delta _i]\) and \([B_i|S_{i},\delta _i]\) are both normal, denoted by \(N(\nu e^{\delta _i} L^*_i, \sigma ^2 L^*_i)\) and \(N(\nu e^{\delta _i} S_{i}, \sigma ^2 S_{i})\), respectively.

We estimate \(\varvec{\theta }\) by maximizing \(\log L_{obs}(\varvec{\theta }\big |\text{ Observed-data})\). Denote the resulting estimator by \(\hat{\varvec{\theta }}_{n}\). One can use \(\hat{\varvec{\theta }}_{n}\) together with the collection of \({\underline{\delta }}^{(1)},\cdots , {\underline{\delta }}^{(J)}\) in the last iteration to compute (7) and (8) and to obtain \({\hat{F}}_n(\cdot )\) and \({\hat{F}}_{n,h}(\cdot )\), respectively. Here \({\underline{\delta }}^{(j)}\) for \(j=1,\ldots ,J\) are the n-dimensional vectors with the i-th components \(\delta _i^{(j)}\).

We apply the MCEM algorithm (Wei and Tanner 1990) to compute the MLE, and present details in Algorithm A below. The log-likelihood function of \(\varvec{\theta }\) based on the observed data (3) together with \({\underline{S}},{\underline{\delta }}\) is \(l_F(\varvec{\theta }|\text {Observed-data},{\underline{S}}, {\underline{\delta }}) = l_{F_1}(\nu ,\sigma |{\underline{S}},{\underline{\delta }}) + l_{F_2}(\varvec{\theta }; {\underline{S}},{\underline{\delta }})\), where

$$\begin{aligned} l_{F_1}(\nu ,\sigma |{\underline{S}},{\underline{\delta }}) =-n\log \sigma ^2-\sum _{i=1}^{n}\frac{\big (D_i-\nu e^{\delta _i} L^*_i\big )^{2}}{2\sigma ^2 L^*_{i}}-\sum _{i=1}^{n}\frac{\big (B_i-\nu e^{\delta _i}S_i\big )^{2}}{2\sigma ^2 S_{i}} \end{aligned}$$

and \(l_{F_2}(\varvec{\theta }; {\underline{S}},{\underline{\delta }}) =\sum _{i=1}^{n}\log [S_i|\delta _i]+\sum _{i=1}^n\log \phi (\delta _i;\sigma _r)\).

Algorithm A For \(m=0,1,2,\cdots ,\) denote the estimate from the mth iteration by \({\varvec{\theta }}^{(m)} = ({\nu }^{(m)},{\sigma }^{(m)},{\sigma _r}^{(m)})\).

  • E-step. Approximate \(Q(\varvec{\theta },\varvec{\theta }^{(m)}) =\text{ E }\{l_F(\varvec{\theta }|\text {Observed-data},{\underline{S}}, {\underline{\delta }})|{\mathbf {O}},{\varvec{\theta }}^{(m)}\}\) as

    $$\begin{aligned} \frac{1}{J}\sum _{j=1}^{J}l_F(\varvec{\theta }|\text {Observed-data}, {\underline{S}}^{(j)}, {\underline{\delta }}^{(j)})&=\frac{1}{J}\sum _{j=1}^{J}l_{F_1}(\nu ,\sigma |{\underline{S}}^{(j)}, {\underline{\delta }}^{(j)})\nonumber \\&\qquad + \frac{1}{J}\sum _{j=1}^{J} l_{F_2}(\varvec{\theta }; {\underline{S}}^{(j)},{\underline{\delta }}^{(j)}), \end{aligned}$$
    (10)

    where for \(j=1,2,\cdots ,J\), \((S_{i}^{(j)}, \delta _i^{(j)})\) is generated from the conditional distribution given the observed data with the current parameter estimate \({\varvec{\theta }}^{(m)}\),

    $$\begin{aligned}{}[S,\delta |{\mathbf {O}}_i;\varvec{\theta }^{(m)}]=\frac{ L_{obs,i|S,\delta } ({\nu }^{(m)},{\sigma }^{(m)};S,\delta ) [S,\delta ;\varvec{\theta }^{(m)}]}{\int _{0}^{\infty }\int _{-\infty }^{\infty } L_{obs,i|S,\delta } ({\nu }^{(m)},{\sigma }^{(m)};S,\delta ) \mathrm {d}[S,\delta ;\varvec{\theta }^{(m)}]}. \end{aligned}$$
    (11)
  • M-step. Maximize (10) with respect to \(\varvec{\theta }\) to obtain \({\varvec{\theta }}^{(m+1)}\).

Repeat Steps E and M until \(||{\varvec{\theta }}^{(m+1)}-{\varvec{\theta }}^{(m)}||<\epsilon \) for a pre-specified tolerance \(\epsilon \). The limit of the sequence \(\{\varvec{\theta }^{(m)}: m=1,2,\ldots \}\) is the MLE \(\hat{\varvec{\theta }}_{n}\).

We follow the Metropolis–Hastings algorithm (Metropolis et al. 1953; Hastings 1970) to generate \((S_{i}^{(j)}, \delta _i^{(j)})\) from the conditional distribution (11). The details are provided in Sect. S1.1 of the Supplementary Material. One should also note that \([S_i|\delta _i]\) in \(l_{F_2}(\varvec{\theta }; {\underline{S}},{\underline{\delta }})\) equals \([S_i|B_i,\delta _i][B_i|\delta _i]\big /[B_i|S_i,\delta _i]\), which does not have much additional information on the parameters \(\nu ,\sigma \). To ease the computational burden, one may replace (10) with the following to update \({\varvec{\theta }}^{(m)}\):

$$\begin{aligned} {\tilde{Q}}(\varvec{\theta }, \varvec{\theta }^{(m)})=\frac{1}{J}\sum _{j=1}^{J}l_{F_1}(\nu ,\sigma |{\underline{S}}^{(j)}, {\underline{\delta }}^{(j)}) + \frac{1}{J}\sum _{j=1}^{J}\sum _{i=1}^n\log \phi (\delta ^{(j)}_i;\sigma _r). \end{aligned}$$
(12)

The maximizing procedure based on (12) leads to a variant of Algorithm A and results in \(\tilde{\varvec{\theta }}_{n}\), a close approximation to the MLE \(\hat{\varvec{\theta }}_{n}\). For the numerical studies presented in this paper, we choose \(J=200\) and the algorithm converges with \(J=200\).

We may obtain another estimator by maximizing the conditional likelihood function using only the observations on D and \(L^*\). The log conditional likelihood function is \(\log L^{c}_{obs}(\varvec{\theta }\big |\text{ Observed-data}) =\sum _{i=1}^n \log \int _{-\infty }^{\infty }[D_i|L^*_i,\delta ]\phi (\delta ;\sigma _r)d\delta \), and can be written as

$$\begin{aligned} \sum _{i=1}^{n}\log \left( \int _{-\infty }^{\infty } \left[ \frac{1}{\sqrt{2\pi \sigma ^2L^*_i}}\exp \left\{ -\frac{(D_i-\nu e^{\delta }L^*_i)^{2}}{2\sigma ^2 L^*_{i}}\right\} \right] \phi (\delta ;\sigma _r)\mathrm {d}\delta \right) . \end{aligned}$$
(13)

The estimator obtained by maximizing (13) is likely less efficient but easier to implement. We describe the procedure to obtain the maximizer of (13) in Sect. S1.2 of the Supplementary Material. We refer to the second algorithm as Algorithm B and denote estimators obtained from the two algorithms by \(\hat{\varvec{\theta }}_{n_A}\) and \(\hat{\varvec{\theta }}_{n_B}\) for the reminder of the paper. The estimate obtained from Algorithm B may be used as the initial estimate \(\varvec{\theta }^{(0)}\) for Algorithm A.

2.4 Asymptotic properties of \({\hat{F}}_n(t)\) and variance estimation

The proposed estimator \({\hat{F}}_n(t)\) using \(\hat{\varvec{\theta }}_{n_A}\) from Algorithm A in Sect. 2.3 has the following asymptotic properties.

Theorem 1

Under Assumptions (A1) and (A2) and Conditions (C1)-(C5) for the log-likelihood function in (9), the estimator \({\hat{F}}_n(t)\) has the following properties:

  1. (i)

    Strong Consistency. \(sup_{t\in [0,\tau ]}|{\hat{F}}_n(t)-F(t)|\overset{p}{\rightarrow }0\) as \(n\rightarrow \infty \).

  2. (ii)

    Weak Convergence. For \(t\in [0,\tau ]\), as \(n\rightarrow \infty \), \(\sqrt{n}({\hat{F}}_n(t)-F(t))\) converges weakly in \(\ell ^{\infty }([0,\tau ])\) to a tight, mean-zero Gaussian process \({\mathcal {G}}\) with covariance \(Cov( {\mathcal {G}}(t), {\mathcal {G}}(s))\) given by

    $$\begin{aligned} {\left\{ \begin{array}{ll} \int _{0}^{\infty }\int _{0}^{\infty } M(t,l^*, b;\varvec{\theta }_0)M(s,l^*, b;\varvec{\theta }_0)h(l^*,b)dl^*db-F(t)F(s), &{} t\ne s \\ \int _{0}^{\infty }\int _{0}^{\infty } M(t,l^*, b;\varvec{\theta }_0)^2 h(l^*,b)dl^*db-F^2(t)&{}\\ \quad +\text{ E}_{\varvec{\theta }_0}[{\partial M(t,L^*_i, B_i;\varvec{\theta }_0)}\big /{\partial \varvec{\theta }}]'\varPi ^{-1}(\varvec{\theta }_0)\text{ E}_{\varvec{\theta }_0}[{\partial M(t,L^*_i, B_i;\varvec{\theta }_0)}\big /{\partial \varvec{\theta }}], &{}t=s, \end{array}\right. } \end{aligned}$$
    (14)

    where \(\varvec{\theta }_0\) is the true parameter, \(\varPi (\varvec{\theta }_0) =\text{ E }\big \{-\partial ^2 \log L_{obs}(\varvec{\theta };{\mathbf {O}}_i)\big /\partial \varvec{\theta }\varvec{\theta }^{'}\big \}\) is the same as \(\varSigma (\varvec{\theta }_0) =\text{ Var }\big \{\partial \log L_{obs}(\varvec{\theta };{\mathbf {O}}_i)\big /\partial \varvec{\theta }\big \}\) with \(\log L_{obs}(\varvec{\theta }; {\mathbf {O}}_i)\) the contribution from individual i to the log-likelihood function in (9), \(M(t,L^*_i, B_i;\varvec{\theta }) =\int _{-\infty }^{\infty } G(t-L^*_i|B_i,\nu e^{\delta },\sigma ) \phi (\delta |{\mathbf {O}}_i;\nu ,\sigma ,\sigma _{r}) \mathrm {d}\delta \), and \(h(l^*,b)\) is the joint probability density function of \(L^*_i\) and \(B_i\).

A proof of Theorem 1 is outlined in the Appendix. It results in a consistent estimator of the covariance function in (14) substituting its unknown elements with their following estimators.

Note that \(\int _{0}^{\infty }\int _{0}^{\infty } M(t,l^*, b;\varvec{\theta }_0)^2 h(l^*,b)dl^*db\) can be approximated by \(n^{-1}\sum _{i=1}^{n}\big [\sum _{k=1}^{K}G(t-L^*_i|B_i,{\hat{\nu }}_n e^{\delta _i^{(k)}},{\hat{\sigma }}_n)\big /K\big ]^2\) with \(\delta _i^{(1)},\cdots , \delta _i^{(K)}\) obtained from the last iteration of Algorithm A in Sect. 2.3. We may similarly approximate \(\text{ E}_{\varvec{\theta }_0}\big [\partial M(t,L^*_i, B_i; \varvec{\theta })\big /\partial \varvec{\theta }\big ]\). Further, note that \({\widehat{\varPi }}_n(\varvec{\theta }_0)= -n^{-1}{\partial ^2\log L_{obs}(\varvec{\theta };\text {Observed-data })}/{\partial \varvec{\theta }\varvec{\theta }^{'}}\Big )\) converges in probability to \(\varPi (\varvec{\theta }_0)= \varSigma (\varvec{\theta }_0)\), and so does \({\widehat{\varSigma }}_n(\varvec{\theta }_0)=n^{-1}\text{ Var}_{\varvec{\theta }_0}\Big ( {\partial \log L_{obs}(\varvec{\theta };\text {Observed-data})} \Big /{\partial \varvec{\theta }}\Big )\). Thus, either \({\widehat{\varPi }}_n(\hat{\varvec{\theta }}_{n_A})\) or \({\widehat{\varSigma }}_n(\hat{\varvec{\theta }}_{n_A})\) can be used to estimate \(\varPi (\varvec{\theta }_0)=\varSigma (\varvec{\theta }_0)\).

2.5 Construction of confidence bands for \(F(\cdot )\)

Based on Theorem 1, we see that the process \(\sqrt{n}({\hat{F}}_{n}(t)-F(t))\big /\sqrt{\text{ var }(t)}\) converges weakly to the standard Gaussian process, where \(\text{ var }(t)\) is \(\text{ Cov }({\mathcal {G}}(t), {\mathcal {G}}(s))\) given in (14) with \(t=s\). Denote the consistent estimator of \(\text{ var }(t)\) obtained as described in Sect. 2.4 by \(\widehat{\text{ var }}(t)\). We employ the resampling approach in Hu and Lagakos (1999); Zhao et al. (2008) to construct the following confidence band (CB) for the distribution \(F(\cdot )\).

The (\(1-\alpha \)) confidence band for \(F(\cdot )\) is

$$\begin{aligned} \Big \{q(\cdot ):\text { for all }t\in [0,\tau ], q(t)\in \Big [{\hat{F}}_{n}(t)-c_\alpha \sqrt{\frac{\widehat{\text{ var }}(t)}{n}},{\hat{F}}_{n}(t)+c_\alpha \sqrt{\frac{\widehat{\text{ var }}(t)}{n}}\Big ]\Big \}, \end{aligned}$$

where the critical value \(c_\alpha \) is determined by the resampling scheme as follows. For \(t\in [0,\tau ]\), define

$$\begin{aligned} C_{n}(t)=\sqrt{\frac{{1}}{n \widehat{\text{ var }}(t)}}\sum _{i=1}^{n}\left[ \int _{-\infty }^{\infty } G(t-L^*_i|B_i,{\hat{\nu }} e^{\delta },{\hat{\sigma }}) \phi (\delta |{\mathbf {O}}_i;{\hat{\nu }},{\hat{\sigma }},{\hat{\sigma }}_r) \mathrm {d}\delta -{\hat{F}}_n(t)\right] Z_i, \end{aligned}$$

where \(Z_1,\cdots , Z_n \sim N(0,1)\) iid and are independent of the data. We compute \(c_\alpha \) as follows:

  1. Step (i)

    . Generate M sets of independent realizations of \((Z_1,\cdots , Z_n)\) and, with each of sets, compute \({C}^{(m)}_{n}(\cdot )\) for \(m=1,\cdots ,M\).

  2. Step (ii)

    . Choose \(c_\alpha \) as the \((1-\alpha )\%\) quantile of \(\sup _{t\in [0,\tau ]}|{C}^{(1)}_{n}(t)|,\ldots , \sup _{t\in [0,\tau ]}|{C}^{(M)}_{n}(t)|\).

3 Analysis of Alberta Forest fire data

We now apply the proposed approach to analyze the wildland fire data that motivated this research. Alberta Agriculture and Forestry collected records of 603 lightning-caused fires that occurred in 10 wildland fire management areas of Alberta, Canada during the fire season from May to August in 2004. Each fire record contains the fire progression information: the times and the fire burnt area at the time of report and at the time of initial attack. As expected, the records do not include the exact fire start times.

Figure 2 shows the burnt area at the report times and at initial attack times for the different regions. The 10 Alberta wildland fire management areas are classified into two groups: north and south. The north region includes Fort McMurray, High Level, Lac La Biche, Peace River, Slave Lake; the south region: Calgary, Edson, Grande Prairie, Rocky Mountain House and Whitecourt. Table S1 in Sect. S2 of the Supplementary Material summarizes burnt area for the two regions at the report times, the initial attack times, and at the time when fires were extinguished. Fires in the north region tend to have larger burnt area at the times of report and initial attack. The distributions of the burnt area are skewed so we use the transformed version \(\log _{10} (\text {burnt area}+1)\) in the analysis.

Fig. 2
figure 2

Burnt area in each management region

The time of initial attack is when the first fire-fighting resource arrives at a wildland fire to prevent the fire from spreading, and to extinguish it if possible. It is believed that fires with a delayed initial attack may require a more substantial suppression effort.

Using the proposed approach, we estimate the distribution of the duration between the start of a fire to its initial attack. We consider two cases with Model (1): (i) \(\sigma _r=0\) (i.e., \(\nu _i=\nu \) for \(i=1,2,\cdots ,n\)), and (ii) \(\sigma _r\ge 0\). Table 1 presents the parameter estimates and the corresponding standard error estimates obtained by Algorithms A and B in Sect. 2.3. The standard errors are estimated using both the inverse Fisher information matrix and the sandwich variance estimator. We also provide computing times for each algorithm in Table 1. Algorithms B is computationally faster than Algorithms A, but it yields less efficient estimator as the estimated standard errors are larger than those of Algorithms A. The estimates of \(\sigma _r\) for the model with random drift are quite large, indicating considerable variation among the fires. This could be because the fire spread rates depend on location and local weather.

Table 1 Estimates of parameters in model (1) with Alberta Wildfire Data

We estimate the distribution of duration by substituting the estimated model parameters into (7), and obtained the smoothed estimator based on (8). To make a comparison, we also evaluated the empirical distribution function based on the observed event duration, the naive estimator, and the Turnbull estimator viewing the fire data as interval-censored data with \(L_{i}\in [L^{*}_{i}, L^{*}_{i}+R_{max}]\). We set \(R_{max}= 6, 12\) or 48 hours for illustration. In fact, \(R_{max}\) could be up to 2 weeks (Wotton and Martell 2005). Figure 3 presents the estimated distributions for the times to initial attack with Algorithms A and B together with approximate \(95\%\) pointwise confidence intervals (CIs) calculated using the estimated asymptotic variance given in (14).

Fig. 3
figure 3

Estimated distributions for times to initial attack with Alberta Data

Figure 3 shows that the naive distribution estimate and the Turnbull’s estimates are different from the proposed estimates. We see that Turnbull’s estimates deviate more from the proposed estimates as \(R_{max}\) increases. This is because a larger \(R_{max}\) can lead to a wider interval \((L^*_i,L^*_i+R_{max})\) for \(L_i\). As a result, there are fewer disjoint inner–most intervals within which the survivor function estimated by Turnbull’s method can jump. Comparing the estimates by the two algorithms, we see that Algorithms A can produce a more efficient estimator. We also evaluate the kernel-smoothed estimator (8) presented in Sect. 2.2. The distribution estimates and their corresponding \(95\%\) CIs/CBs are in close agreement with those un-smoothed estimates.

Figure 4 presents scatterplots of the final burnt area vs the estimated duration times. The estimated duration is calculated by \({\tilde{L}}_i=L^*_i+{\tilde{S}}_{i}\), where \({\tilde{S}}_{i}\) is generated from the posterior distribution of reporting delay \([S|\delta , {\mathbf {O}}_i;\varvec{\theta }^{(m)}]\) at the last iteration of the MCEM procedure in Algorithms A, for \(i=1,\cdots ,n\). We present scatterplots using three realizations of \({\tilde{L}}_i\) together with the scatterplot in Fig. 4a using the observed portion of the duration \(L^*_i\). The pattern of association between the final burnt area and duration is apparently more obvious with the estimated duration. This suggests that the duration between fire start and initial attack may be more predictive of the final burnt area. Accounting for the reporting delay time is worthwhile when using the duration as a predictor for the final burnt area.

Fig. 4
figure 4

Scatterplots of final burnt area and estimated times to initial attack with three realizations of \({\tilde{L}}_i=L^*_i+{\tilde{S}}_{i}\), where \({\tilde{S}}_{i}\) is generated from the posterior distribution of reporting delay \([S|\delta , {\mathbf {O}}_i;\varvec{\theta }^{(m)}]\) at the last iteration of the MCEM procedure in Algorithm A, for \(i=1,\cdots ,n\). The solid line represents the fitted linear regression line of the final burnt area and time to initial attack; the dashed curve is the local regression curve. The shaded areas define corresponding approximate \(95\%\) confidence bands

We applied the proposed procedure to analyze the data of fires from the north region and the ones from the south region separately. Table 2 gives the model parameter estimates. Figure 5 shows the estimated duration distributions. The estimate of \(\sigma _r\) associated with north region is large, significantly different from zero. It indicates a larger variation across fires in the region. The south region has a smaller estimate of \(\sigma _r\).

Table 2 Estimates of parameters in model (1) with data of Wildfires in North region and South region
Fig. 5
figure 5

Estimated distributions of time to initial attack in north region and south region

4 Simulation studies

We conducted two simulation studies to examine the finite-sample performance of the proposed approach and to verify the findings from the data analysis. Specifically, in the first simulation study, we generated data based on Model (1) to verify consistency and efficiency, and in the second simulation study, we assessed robustness of the approach against model misspecification.

4.1 Simulation A: Consistency and efficiency

To mimic the fire data, we simulated a study with \(n=300\) independent fires with the data of fire i for \(i=1,2,\cdots ,n\) generated as follows.

  1. (i)

    Generate the burnt area process \(A_i(t), t\in [0,30]\) based on Model (1) with the parameter values \(\nu =2.0\) and \(\sigma =0.5\), and \(\delta _i\sim N(0,\sigma ^2_r)\) with \(\sigma _r=0, 0.5\), or 0.8.

  2. (ii)

    Generate the size at the report time \(B_i \sim \text{ logNormal }(2.0,0.1)\), and determine the reporting delay as \(S_{i}=\max \{t|t\in [0,30], A_{i}(t)\le B_{i}\}\).

  3. (iii)

    Generate \(L^*_i \sim \text{ Exp }(3.0 B_i^{-1})\), calculate the duration \(L_{i}=S_{i}+L^*_{i}\), and obtain the burnt area at the time of initial attack \(D_i=A_i(L_i)\).

Using the simulated data, we evaluated the estimator \(\tilde{\varvec{\theta }}_{n_A}\), the approximation to \(\hat{\varvec{\theta }}_{n_A}\), by the variant of Algorithm A in Sect. 2.3. We then obtained the corresponding duration distribution estimates.

Table 3 summarizes the parameter estimates based on 200 simulation repetitions. The sample means of estimates obtained by Wiener process model with a constant drift are close to the true parameter values for the scenario of \(\sigma _r=0\); the bias is evident when the true value of \(\sigma _r\) increases to 0.5 and 0.8. When we use a model with random drift, i.e. \(\sigma _r\ge 0\), the sample means of estimates are close to the true parameter values for all three scenarios of \(\sigma _r\). This provides an empirical verification of the consistency of the two estimators, and suggests that it may be acceptable not to assume \(\sigma _r=0\) in practice. Further, we estimated \(\theta \) by maximizing the conditional likelihood given in (13) which uses only observations on D and \(L^*\). The results are presented in Table S2 of Sect. S3.1 of the Supplementary Material. While the parameter estimates are similar to those obtained by Algorithm A, the sample means of the estimated standard errors are larger. The sample means of the estimated standard errors associated with the robust sandwich variance estimator are similar to the corresponding sample standard deviations of the estimates for both algorithms, which suggests that the proposed variance estimator performs sufficiently well at the simulation settings, we conclude that maximizing (13) may yield less efficient estimators.

Table 3 Numerical Properties of Estimators for Model Parameters in Simulation A with Algorithm A

For each generated data set, we estimated the duration distribution by \({\hat{F}}_n(t)\) using \(\varvec{\theta }\) obtained from Algorithm A, and used \({\tilde{F}}_n(t;\nu ,\sigma ,\sigma _r)\) given in (5) with the true values of parameters as a reference. The consistent variance estimator of (14) given in Appendix C was evaluated to construct confidence intervals (CIs). Assuming the drift of Wiener process involves random effects, Fig. 6 shows the sample means of the 200 estimated distribution functions together with the approximate conventional \(95\%\) CIs and their \(2.5\%\), \(97.5\%\) sample quantiles. To make a comparison, each plot in Fig. 6 also includes the sample means of the 200 evaluations of the empirical distribution function \(F_n(\cdot )\) using the true duration, the empirical distribution function \(F^*_n(\cdot )\) using the observed duration times (the naive approach), and the Turnbull estimator using \(R_{max}\) as the third quantile and maximum of the reporting delay in each generated data set.

Fig. 6
figure 6

Simulation A: sample means of estimated distribution functions with corresponding \(95\%\) CI, based on model with random drift

The estimate associated with the proposed approach is very close to those based on \({\tilde{F}}_n(t;\nu ,\sigma ,\sigma _r)\) using true \(\varvec{\theta }\). At all simulation settings, both the approximate \(95\%\) CIs and the CIs using the \(2.5\%\) and \(97.5\%\) sample quantiles contain the empirical distribution functions \(F_n(\cdot )\) obtained with true duration. The naive estimates and the Turnbull’s estimates appear to be different from \(F_n(\cdot )\). The Turnbull estimator is highly dependent on the assumed values of \(R_{max}\), especially when \(R_{max}\) is much larger than \(L^*_i\), the performance of the Turnbull estimator deteriorates substantially. Histograms for realizations of \(L^*_i\) presented in Fig. S1 of Sect. S3.1 of the Supplementary Material support this finding. For the scenario that the true value of \(\sigma _r\) is 0, the two values of \(R_{max}\) are relatively small and Turnbull’s estimates are close to the proposed estimate as shown in Fig. 6. When the true value of \(\sigma _r\) increases, \(R_{max}\) becomes much greater than the maximum of \(L^*_i\) when it is chosen as the maximum of the reporting delays in the generated dataset and the corresponding Turnbull’s estimates depart much further from the proposed estimates. This is consistent with the outcome seen in the data analysis. Moreover, we evaluated the distribution estimator using \(\varvec{{\hat{\theta }}}_{n_B}\) obtained from Algorithm B (See Fig. S2 of Sect. S3.1 in the Supplementary Material) and the kernel smoothed version of the proposed estimator. The behavior of these two estimates in comparison with the naive estimates and Turnbull’s estimates is similar to that observed in Fig. 6.

We computed the point-wise sample mean square errors of Turnbull’s estimates, the proposed estimates and the reference estimates based on \({\tilde{F}}_n(t;\nu ,\sigma ,\sigma _r)\). With any \(t\ge 0\), the proposed estimator has the smallest sample mean squared error, which demonstrates the relative efficiency of the proposed estimator over the naive estimator and the Turnbull estimator at all simulation settings. Figure S3 in Sect. S3.1 of the Supplementary Material presents the sample standard deviations and sample means of the estimated standard errors of the proposed distribution estimator with \(\tilde{\varvec{\theta }}_{n_A}\) by Algorithm A, together with those associated with the empirical distribution function and \({\tilde{F}}_n(\cdot ; \nu ,\sigma ,\sigma _r)\), which require more information than the data structure of interest. The plots in the figure show that the variation of the proposed estimator is comparable to that for \({\tilde{F}}_n(\cdot ; \nu ,\sigma ,\sigma _r)\), and is, in some settings, smaller than that of the empirical function. This indicates that using the available information on fire growth can recover the efficiency loss due to the missing start times and even in some situations outperform the empirical distribution function, a nonparametric estimator for the duration distribution.

4.2 Simulation B: Robustness

We generated burnt area sample paths for a collection of simulated independent fires following the model \(A_{i}(t)=\nu _i t +\sigma _i W^*_{i}(t)\) , \(i=1,2,\cdots ,n=300\), where \(\nu _i=\nu e^{\delta _i}\) with \(\delta _i \sim N(0,\sigma ^2_r)\) and \(W^*_i(\cdot )\) is a process with correlated increments. Specifically, the increments \(W^*_i(t_{k})-W^*_i(t_{k-1})\) for the partition \(t_k, k=1,2,\cdots ,K\) of the time period [0, 30] were generated from \(MN({\mathbf {0}}, \varSigma )\) with the \((k,k')\) entry as \(\varDelta t \rho ^{|k-k'|}\) for \(k\ne k'\) and \( \varDelta t\) for \( k= k'\) with \(\rho =0.2\). The observations on variables \(B, S, L^*\), and D were generated in the same way as in Simulation A.

We computed \(\varvec{\theta }\) estimates and then the duration distribution estimates as if the data were generated from Model (1). Table S3 in Sect. S3.2 of the Supplementary Material summarizes the simulation outcomes for 200 replicates of the estimates. The sample means of the \(\varvec{\theta }\) estimates are close to the true parameter values with the assumed Wiener process model using random drift. The sample means of the estimated standard errors from the robust estimator are fairly close to the sample standard deviations of the \(\varvec{\theta }\) estimates.

Figure S4 in Sect. S3.2 of the Supplementary Material presents the sample means of the estimated distribution functions from both Algorithms A and B with the approximate \(95\%\) CIs and their \(2.5\%\) and \(97.5\%\) quantiles. In each plot, we also overlaid the sample means of the estimates by empirical distribution function, \({\tilde{F}}_n(\cdot ; \nu ,\sigma ,\sigma _r)\), the naive estimator, and the Turnbull estimator. These plots indicate that the proposed estimator is close to the empirical function in the situation, even when Model (1) does not hold.

We also explored scenarios where the burnt-area process is generated following other models, such as \(A_{i}(t)=\nu _i t^2 +\sigma _i W_{i}(t)\). The estimated duration distribution based on the proposed approach assuming Model (1) is also close to the empirical function using all the true duration times. This indicates that the proposed estimator \({\hat{F}}_n(\cdot )\) can be quite robust to model misspecification. Further investigation could lead to a way of systematically checking the validity of Model (1).

5 Final remarks

We propose in this paper procedures for estimating the distribution of event duration with observations in the presence of missing time origins. By employing the distribution of the first-hitting-time with a Wiener process, we link the distribution of the event duration with associated longitudinal measures. Both simulation and real data analysis show that the proposed approach performs well in predicting the times to initial attack and also demonstrates the importance of taking into account the duration between the unobserved start time and the later report time.

The proposed approach is applicable to many situations where event duration is of interest but where the time origins in the duration observations are missing. Examples include predicting the length of period from the unknown HIV infection to detection of infection by making use of the longitudinal viral load measures (Doksum and Normand 1995), predicting the lifetime of trees by using longitudinal measures of the diameter at breast height for trees (Thompson 2011), and, as suggested by a referee, estimating the onset time of a disease by utilizing longitudinal medical expenditure data such as the usage of prescription drugs and the cost of skilled nursing facilities. The idea underlying our approach could readily be applied to analysis under a different model for longitudinal measures, e.g. Wang (2008); Wang and Xu (2010). It would be worth exploring the validity of the stochastic process for longitudinal measures.

Several other investigation would be worthwhile. The target population in the wildland fire application of this paper includes only the fires that are reported and have been dispatched with initial attack resources. When a study aims to explore the whole physical development process of wildland fire, the fires not reported should also be included in the population under consideration. The current available wildland fire records are then length-biased. We suggest to extend the idea of the proposed approach and adapt methods for estimating distributions with right-censored event times subject to length-biased sampling (e.g. Asgharian and Wolfson 2005; Huang and Qin 2011) to the situation where the origins of the duration times are missing.

Heterogeneity and correlation between fires should be accounted for. Applying the proposed approach to the data stratified by fire region has revealed that the event duration distributions are different for fires in different regions; see Table 2 and Fig. 5, for example. The duration is likely related to fuel type and moisture content as well as wind activity and local topography. To deal with this problem, as discussed in Wang (2010), we could follow (Lawless and Crowder 2004) and specify the drift parameter \(\nu _i\) of Model (1) as a function of covariates. In addition, due to potential correlation between wildland fires, it would be of interest to extend the approach to account for spatio-temporal correlation. A third possibility is to follow (Heitjan and Rubin 1990) to accommodate semi-continuous data with the rounded burnt-area records shown in Table S1 of Sect. S2 of the Supplementary Material.

More investigation is required to systematically determine J, the number of Monte Carlo samples in Algorithm A. We can incorporate automated data-driven strategies (e.g. Levine and Casella 2001; Caffo et al. 2005) to the current algorithm to choose an appropriate J at each iteration. This paper evaluates integrals by monte carlo integration. As suggested by a referee, it can be interesting to compare the integral approximation with different numerical integration approaches, such as the Gaussian quadrature rule.