Appendix. Derivation of Estimating Equations
In this Appendix, we derive the equations required for Algorithm 1. We first present the incomplete and complete data log-likelihoods for the DR model, which leads to Equation (9). We then obtain Equation (10) using Bayes’ rule, and Equations (11) and (12) are obtained from the expected complete data log-likelihood.
The general form of the log-likelihood for a univariate point process with CIF λ(t) is (see Daley & Vere-Jones, 2003)
$$ l(\theta\mid X) = \sum_k \ln\bigl(\lambda(t_k)\bigr) - \int_0^T \lambda(s)\,ds, $$
(A1)
where [0,T] is the observation period, X=(t
1,…,t
N
) denotes the observed event times, and θ contains the parameters of the model. The integral of λ(t) over the interval [0,u] is sometimes referred to as the compensator and is denoted as Λ(u).
For an orderly, multivariate point process the log-likelihood can be treated through the univariate margins. In particular, if the log-likelihood for each univariate margin is l
i
, the overall likelihood is given as l=∑
i
l
i
. We present results in terms of the univariate margins to simplify notation.
Substituting Equations (4) through (7) into the bivariate version of (A1) leads directly to an expression for the incomplete data log-likelihood of the DR model. As discussed above, this does not produce stable parameter estimates because it contains the logarithm of a weighted sum of densities. Therefore, we employ the branching structure representation of the Hawkes process to obtain the complete data log-likelihood:
$$ l_i(\theta_i \mid X_i, Z_i) = \sum_{z \in z_i} \biggl(\,\sum _{t_{ik} \in z} \ln\bigl(\lambda_{z}(t_{ik})\bigr) - \varLambda_{z}(T) \biggr) $$
(A2)
for i={1,2,} and k∈{1,…,N
i
}. The collection of processes z
i
and the CIFs λ
z
(t) were defined in our discussion of the branching structure. The summation over z∈z
i
is implied by the independence assumptions of the branching structure. The equivalence between Equation (A2) and the incomplete data representation was initially established by Hawkes and Oakes (1974). The overall result is to replace the logarithm of a sum of densities with the sum of their logarithms, which is also the role of the complete data log-likelihood in finite mixture modelling (McLachlan & Peel, 2000, pp. 47–49).
The unknown part of Equation (A2) is represented by the summation over t
ik
∈z; because we do not know the branching structure, we do not know which process each t
ik
belongs to. It is important to recall here that the notation λ
z
(t) is shorthand for λ
ijk
(t) and does not indicate dependence of the rate functions on the missing variable Z
i
. For explicitness, we also note that the number of processes in z
i
is N
1+N
2+1, which is necessarily larger than the number of events N
i
. This implies that some of the processes in z
i
contain no events. For these cases, we evaluate the Poisson process with zero arrivals, so only the compensator contributes to the log-likelihood.
In order to obtain the expectation of Equation (A2) over the posterior distribution of Z, we write it in the equivalent form:
$$ l_i(\theta_i \mid X_i, Z_i) = \sum_{z \in z_i} \Biggl(\,\sum _{k=1}^{N_i} \ln\bigl(\lambda_{z}(t_{ik}) \bigr) \times\delta(Z_{ik} =z) - \varLambda_{z}(T) \Biggr), $$
(A3)
where
This is again the same approach employed in mixture modelling, and the expected value is equally as straightforward to obtain:
This leads directly to the expression for Q(θ∣θ
(n)) in Equation (9).
We now obtain the posterior probabilities in Equation (10) by following the usual procedure of providing the likelihoods and the priors and then applying Bayes’ rule. For the likelihoods, we have
where the first equality follows from the independence assumptions of the branching structure and the second from the definition of the conditional density of the inhomogeneous Poisson process (Daley & Vere-Jones, 2003, p. 23).
Next, we require an expression for the prior probability that an event belongs to process z. These can be derived directly from the result, shown in Equation (A6), that the intensity function λ
z
(t) is proportional to the density of the process z. But it is perhaps more intuitive to begin by interpreting the prior probability as the proportion of the total number of events N
i
that are due to process z, which is analogous with the treatment in finite mixture modelling. This leads to
$$ \operatorname{Prob} (Z_{ik} = z \mid \theta_i) = N_z / N_i. $$
(A7)
Within the maximum likelihood framework, the counts N
z
can be obtained from the complete data log-likelihood given in Equation (A3). This is done by solving the following two maximum likelihood equations:
where N
i00=∑
k=1
δ(Z
ik
=z
i00) is the number of events in the baseline process. This implies
$$ N_{i00} = \mu_i T = \varLambda_{i00}(T). $$
(A9)
Secondly, we consider the complete data log-likelihood for a single process z
ijk
≠z
i00:
where N
ijk
=∑
k
δ(Z
ik
=z
ijk
) is the total number of events in process z
ijk
, f
ijk
is the kernel density of process z
ijk
, and F
ijk
the cumulative distribution function. This gives
$$ N_{ijk} = \alpha_{ij} F_{ijk}(T) = \varLambda_{ijk}(T). $$
(A11)
From Equations (A7), (A9), and (A11) it follows that
$$ \operatorname{Prob} (Z_{ik} = z \mid \theta_i) = \varLambda_z(T) / \sum _{m \in z_i} \varLambda_m(T). $$
(A12)
Then using the law of total probability to obtain f(t
ik
∣θ
i
) from Equations (A6) and (A7), Equation (10) follows from application of Bayes’ rule. We note that Veen and Schoenberg (2008) did not explicitly relate the rates of the Poisson process to the posterior probabilities of the branching structure. The foregoing argument fills this void.
The derivation of Equations (11) and (12) from Equation (A5) proceeds in a manner similar to that shown in Equations (A8) and (A10), respectively. It is simply required to replace the N
z
with \(\bar{N}_{z} = \sum_{k}\operatorname{Prob} (Z_{ik} = z \mid X_{i}, \theta_{i}) \) and, in Equation (A10), to sum over all processes \(z \in\{z_{ij1}, \dots, z_{ijN_{j}}\}\).