Abstract
The score test is a computationally efficient method for determining whether marks have a significant impact on the intensity of a Hawkes process. This paper provides theoretical justification for use of this test. It is shown that the score statistic has an asymptotic chi-squared distribution under the null hypothesis that marks do not impact the intensity process. For local power, the asymptotic distribution against local alternatives is proved to be non-central chi-squared. A stationary marked Hawkes process is constructed using a thinning method when the marks are observations on a continuous time stationary process and the joint likelihood of event times and marks is developed for this case, substantially extending existing results which only cover independent and identically distributed marks. These asymptotic chi-squared distributions required for the size and local power of the score test extend existing asymptotic results for likelihood estimates of the unmarked Hawkes process model under mild additional conditions on the moments and ergodicity of the marks process and an additional uniform boundedness assumption, shown to be true for the exponential decay Hawkes process.
Similar content being viewed by others
References
Andersen P, Borgan O, Gill R, Keiding N (1996) Statistical Models Based on Counting Processes. Springer Series in Statistics, Springer, New York
Bacry E, Mastromatteo I, Muzy J-F (2015) Hawkes processes in finance. Market Microstruct Liq 1(01):1550005
Brémaud P, Massoulié L (1996) Stability of nonlinear hawkes processes. Ann Probab 24(3):1563–1588
Breusch TS, Pagan A (1980) The lagrange multiplier test and its applications to model specification in econometrics. Rev Econ Stud 47(1):239–253
Chen F, Hall P (2013) Inference for a nonstationary self-exciting point process with an application in ultra-high frequency financial data modeling. J Appl Probab 50(4):1006–1024
Clinet S (2020) Quasi-likelihood analysis for marked point processes and application to marked hawkes processes. arXiv preprint arXiv:2001.11624
Clinet S, Potiron Y (2018) Statistical inference for the doubly stochastic self-exciting process. Bernoulli 24(4B):3469–3493
Clinet S, Yoshida N (2017) Statistical inference for ergodic point processes and application to limit order book. Stoch Process Appl 127(6):1800–1839
Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Nat Academy Sci United States Am 105(41):15649–15653
Daley D, Vere-Jones D (2002) An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods. Probability and Its Applications. Springer
Duarte A, Löcherbach E, Ost G (2016) Stability, convergence to equilibrium and simulation of non-linear hawkes processes with memory kernels given by the sum of erlang kernels. arXiv preprint arXiv:1610.03300
Embrechts P, Liniger T, Lin L (2011) Multivariate hawkes processes: an application to financial data. J Appl Probab 48A:367–378
Hawkes A (1971) Point spectra of some mutually exciting point processes. J Royal Statist Soc Series B 33(3):438–443
Hawkes AG (2018) Hawkes processes and their applications to finance: a review. Quantitat Finance 18(2):193–198
Hawkes AG, Oakes D (1974) A cluster process representation of a self-exciting process. J Appl Probab 1:493–503
Jacod J, Shiryaev A (2013) Limit theorems for stochastic processes, Volume 288. Springer Science & Business Media
Kallenberg O (2006) Foundations of Modern Probability. Probability and Its Applications, Springer, New York
Liniger TJ (2009) Multivariate hawkes processes. Ph. D. thesis, ETH Zurich
Ogata Y (1978) The asymptotic behaviour of maximum likelihood estimators for stationary point processes. Annals Institut Statist Math 30(1):243–261
Ogata Y (1988) Statistical models for earthquake occurrences and residual analysis for point processes. J Am Statist Assoc 83(401):9–27
Ozaki T (1979) Maximum likelihood estimation of hawkes’ self-exciting point processes. Annals Instit Statist Math 31(1):145–155
Rao C (2009) Linear Statistical Inference and its Applications. Wiley Series in Probability and Statistics. Wiley
Richards K-A, Dunsmuir W, Peters G (2019) Score test for marks in hawkes processes. Available SRRN. https://doi.org/10.2139/ssrn.3381976
Richards, Kylie-Anne, M. . S. F. o. S. U. (2019) Modelling the dynamics of the limit order book in financial markets
Acknowledgements
K-A Richards gratefully acknowledges PhD scholarship support by Boronia Capital Pty. Ltd., Sydney, Australia. The research of S. Clinet is supported by a special grant from Keio University. W. T.M. Dunsmuir was supported by travel funds from the Faculty of Sciences, University of New South Wales. The authors thank the referees for comments and suggestions that have improved the clarity and scope of the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest Statement
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A proof of lemma 1 and proposition 1
Proof (Lemma 1)
By construction, we first show that \(\overline{N}_g\), seen as a marked point process on \(\mathbb {R} \times (\mathbb {R} \times \mathbb {X})\) is compensated by \(ds \times (du \times F_s(d\mathbf {x}))\) where \(F_s(d\mathbf {x})\) is the conditional distribution of \(\mathbf {y}_s\) with respect to the filtration \(\mathcal {F}_{s-}^{\mathbf {y}}\).
We recall that by definition, all we have to show is that for any non-negative measurable predictable process W (see Jacod and Shiryaev 2013, Theorem II.1.8) we have
By a monotone class argument, it is sufficient to prove the martingale property for \(W(s,u,\mathbf {x})(\omega ) = \mathbf {1}_{\{ (t',t] \times F\}}(s,\omega )\mathbf {1}_{A \times B}(u,\mathbf {x}) \) where \(t' <t\), \(F \in \mathcal {F}_{t'}\), \(A \in \mathcal {B}(\mathbb {R})\) and \(B \in \mathcal {X}\). We have
where at the fourth line we have used the independence of \(\mathbf {y}\) and \(\overline{N}\), and at the seventh line we have used again the independence of \(\overline{N}\) from \(\mathbf {y}\) along with the fact that \(\overline{N}\) is a Poisson process with intensity measure equal to \(ds \times du\). \(\square \)
Proof (Proposition 1)
We construct by thinning and a fixed point argument the marked point process \(N_g^\infty \). Define \(\lambda _{g,0}^\infty (t) = \eta ^*\). By induction, we then define the sequences of processes \((N_{g,n}^\infty )_{n \in \mathbb {N}}\) and \((\lambda _{g,n}^\infty )_{n \in \mathbb {N}}\) as follows. For \(n \in \mathbb {N}\), we let \(N_{g,n}^\infty \) and \(\lambda _{g,n+1}^\infty \) be such that for any \(t',t \in \mathbb {R}\) with \(t' < t\) and \(A \in \mathcal {X}\)
Clearly (27a) uniquely defines a measure on \(\mathbb {R} \times \mathbb {X}\). By taking conditional expectations throughout (27a) it is immediate to see that \(t \rightarrow \lambda _{g,n}^\infty (t)\) is the stochastic intensity of the counting process \(t \rightarrow N_{g,n}^\infty ((t',t] \times \mathbb {X})\). Moreover, by positivity of \(w(t-s) g(\mathbf {x})\), we immediately deduce that for any \(t'<t\) and \(A \in \mathcal X\) the quantities \(N_{g,n}^\infty ((t', t] \times A)\) and \(\lambda _{g,n}^\infty (t)\) are increasing with n, so that we may define \(N_g^\infty \) and \(\lambda _{g}^\infty \) the (possibly infinite) point wise limits of \(N_{g,n}^\infty \) and \(\lambda _{g,n}^\infty \). Next, by the monotone convergence theorem, taking the limit \(n \rightarrow +\infty \) in the above equations yields the representation
In particular, the first equation shows that \(t \rightarrow \lambda _g^\infty (t)\) is the stochastic intensity of \(t \rightarrow N_g^\infty ((t', t] \times \mathbb {X})\) for \(t' < t\) and the second equation proves that \(\lambda _g^\infty \) has the desired shape. All we have to check to get the first claim of the proposition is the finiteness of the two limits in (28a)-(28b). Let \(\rho _n(t) = \mathbb {E}[\lambda _{g,n}^\infty (t) - \lambda _{g,n-1}^\infty (t)]\). We have
where we have used at the third step that \(\overline{N}_g\), seen as a marked point process on \(\mathbb {R} \times (\mathbb {R} \times \mathbb {X})\) admits \(dsduF_s(d\mathbf {x})\) as \(\mathcal {F}_s\)-compensator where \(F_s(d\mathbf {x})\) is the conditional distribution of \(\mathbf {y}_s\) with respect to \(\mathcal {F}_{s-}^{\mathbf {y}}\) by Lemma 1, and at the fifth step have used the \(\mathcal {F}_{s-}\)-measurability of the stochastic intensities and that \(\mathbb {E}[g(\mathbf {y}_{s}) | \mathcal {F}_{s-}] = \mathbb {E}[g(\mathbf {y}_{s}) | \mathcal {F}_{s-}^\mathbf {y}] \le C\) (by independence of \(\mathbf {y}\) and \(\overline{N}\)). From here, we deduce that \(\sup _{s \in (-\infty ,t)} \rho _n(s) \le C\vartheta ^* \sup _{s \in (-\infty ,t)} \rho _{n-1}(s)\) since \(\int _0^{+\infty } w(s )ds = 1 \). By a similar calculation, we also have \(\rho _1(t) \le C \vartheta ^* \eta ^*\), so that by an immediate induction \(\sup _{s \in (-\infty ,t)} \rho _n(s) \le (C\vartheta ^*)^n \eta ^*\). Therefore, \(\mathbb {E}\lambda _g^\infty (t) = \eta ^* + \sum _{k=1}^{+\infty } \rho _k(t) \le \eta ^*/(1-C\vartheta ^*) < +\infty \), which implies the almost sure finiteness of both \(N_g^\infty \) (on any set of the form \((t',t] \times \mathbb {X}\), \(- \infty< t'< t < + \infty \)) and \(\lambda _g^\infty (t)\). This proves the first claim. Now we prove the second point, and show first that for any \(n \in \mathbb {N}\), \(\pi _n^\infty (ds \times d\mathbf {x}) = \lambda _{g,n}^\infty (s)ds \times F_s(d\mathbf {x})\) is the \(\mathcal {F}_s\)-compensator of \(N_{g,n}^\infty \). Let W be a non-negative measurable predictable process on \(\mathbb {R} \times \mathbb {X}\). By (27a), we have
where we have used that \(\overline{N}_g\) admits \(ds \times (du \times F_s(d\mathbf {x}))\) as \(\mathcal {F}_s\)-compensator by Lemma 1. Since W is arbitrary, this proves that \(\pi _n^\infty \) is the compensator of \(N_{g,n}^\infty \). Moreover, taking the limit \(n\rightarrow +\infty \) in the above expectations and using again the monotone convergence theorem yields that \(\pi ^\infty (ds \times d\mathbf {x})\) is the compensator of \(N_g^\infty \). Finally, the third claim comes from the stationarity of \(\lambda _{g,n}^\infty \) and \(N_{g,n}^\infty \) which is in turn a consequence of the stationarity of \(\overline{N}_g\) and \(\mathbf {y}\). \(\square \)
B proof of lemma 2
For any \(c \in \mathbb {R}^r\) we denote the linear combinations \(G_c(\mathbf {x})= c^{\mathsf {T}}G(\mathbf {x})\) and similarly for \(\hat{G}_c(\mathbf {x})\). Recall that under \(H_0\), the marked point process \(N_g\) has event intensity identical to that of N defined in (3). Marks are observed at the event times of this process but do not impact the intensity of it. For the exponential decay specification, since \(dim(\varTheta ) = 3\), we need to show Condition 3 for \(p = 4\), i.e that
for \(i = 0, 1, 2\). Notice that only the derivatives with respect to \(\vartheta \) and \(\alpha \) are required. These derivatives are linear combinations of terms of the form
for \(i = 0,1,2\) and \(k=0,1\), and with \(w(t-s;\alpha )= e^{-\alpha (t-s)}\). Since \(\varTheta \) is bounded, we consider the integrals which are finite combinations of terms of the form
for \(i=0,1,2\). Therefore, we need to show to conclude the proof that
where \(p=4\). Let \(f_{i,t}(s)=(t-s)^i \exp (-\underline{\alpha }(t-s))\). For any \(t \ge 0\), we have
for some finite constant C and where the compensator of \(N_g(ds \times d\mathbf {x})\) is \( \lambda (s;\theta ^*) F_s(d\mathbf {x})ds\) where \(F_s(d\mathbf {x})\) is the conditional distribution of \(\mathbf {y}_s\) with respect to \(\mathcal {F}_{s-}^\mathbf {y}\).
First define the probability measure \(\mu (ds) = (\int _0^t f_{i,u} du)^{-1} f_{i,s}ds\) on [0, t], and apply Jensen’s inequality to the second term to get
Where we have used the independence of \(\mathbf {y}\) and \(N_g\), the fact that \(\mathbb {E}[\mathbb {E}[|G_c(\mathbf {y}_s)| | \mathcal {F}_{s-}^\mathbf {y}]^4]]< \mathbb {E}[|G_c(\mathbf {y}_s)|^4] < K\) for some constant \(K >0\), and \(\sup _{t \in \mathbb {R}_{+}}\mathbb {E}[\int _{[0,t)}f_{i,t}(s)\lambda (s;\theta ^*)^4ds]<\infty \) by (Clinet and Yoshida 2017, Lemma A.5). Consider now the first expected value. Using Davis-Burkholder-Gundy inequality we have arguing similarly to (Clinet and Yoshida 2017, Lemma A.2), for some constant \(C<\infty \) not necessarily the same as above,
Similarly to the previous argument the second term is uniformly bounded because
by (Clinet and Yoshida 2017, Lemma A.5) and \(\mathbb {E}[\mathbb {E}[G(\mathbf {y}_s)^2|\mathcal {F}_{s-}^\mathbf {y}]^2] \le \mathbb {E}|G_c(\mathbf {y}_s)|^4 < K\) for some constant \(K >0\). For the first term we have
where we have used the independence of \(\mathbf {y}\) and \(N_g\). Now, \(\mathbb {E}|G_c(\mathbf {y}_s)|^4 < \infty \) and, once more by (Clinet and Yoshida 2017, Lemma A.5) we have
which completes the proof. \(\square \)
C proof of theorem 1
Define, for any fixed \(c \in \mathbb {R}^r\),
This notation is used repeatedly in the proof of the theorem as well as the lemmas used. The proof follows somewhat closely that of Clinet and Yoshida (2017). We first consider the normalized process corresponding to (15) and for any non zero vector of constants \(c\in \mathbb {R}^r\) define the process in \(u \in [0,1]\)
Note that \(S_1^T=\frac{1}{\sqrt{T}}c^{\mathsf {T}}\partial _\psi l_g(\nu ^{*})\). Similarly to Clinet and Yoshida (2017), we establish a functional CLT when \(T \rightarrow \infty \).
The proof of this theorem proceeds via several lemmas. Convergence throughout is with \(T\rightarrow \infty \). The first lemma is concerned with the ergodic properties of \(U(t;\theta ,\phi )\) defined in (29) when \(\phi =\phi ^*\), is fixed at the true value in which case we further abbreviate notation to \(U(t;\theta )= U(t;\theta ,\phi ^*)\).
Lemma 4
Under \(H_0\), there exists a stationary marked Hawkes point process starting from \(-\infty \), \(N_g^{\infty }\), on the original probability space \((\varOmega , \mathcal {F},\mathbb {P})\), adapted to \(\mathcal {F}_t\), such that: (i) \(N^{\infty } = N_g^\infty (\cdot ,\mathbb {X})\) and \(\mathbf {y}\) are independent. (ii) the stochastic intensity of \(N^{\infty }\) admits the representation
Moreover, let us define
Then, the joint process \((\lambda ^\infty , U^{\infty }(.;\theta ^*))\) is stationary ergodic. Finally we have the convergence
Proof
The existence of \(N_g^\infty \) along with property (ii) are direct consequences of Proposition 1. The fact that \(N_g^\infty \) can be constructed on the same probability space as \(N_g\) is ensured by building \(N_g^\infty \) using the same canonical process \(\overline{N}_g\). The independence property (i) comes from the fact that under \(H_0\) we have \(\psi =0\) so that \(g(\cdot ,\cdot ,0) = 1\) and the marks process \(\mathbf {y}\) does not impact the marginal point process \(N^\infty \). Next, since \(\mathbf {y}\) is ergodic by assumption, and the unmarked Hawkes process of jumps \(N^\infty \) is stationary ergodic by assumption, and since both processes are independent from each other, the joint process \((N^\infty ,\mathbf {y})\) is stationary ergodic as well. Since for any \(t \in \mathbb {R}\), \((\lambda ^\infty (t), U^\infty (t,\theta ^*))\) admits a stationary representation and given the form of \((\lambda ^\infty (t), U^\infty (t,\theta ^*))_{t \in \mathbb {R}}\), we can deduce that they are also ergodic by Lemma 10.5 from Kallenberg (2006). Finally, we show (31). We first deal with the convergence of \(f(t):=\mathbb {E}| \lambda (t,\theta ^*) - \lambda ^{\infty }(t) |\) to 0. Defining \(r(t) = \mathbb {E}\int _{(-\infty ,0]} w(t-s;\alpha ^*)\lambda ^\infty (s) ds\), and following the same reasoning as for the proof of Proposition 4.4 (iii) in Clinet and Yoshida (2017), some algebraic manipulations easily lead to the inequality
where for two functions a and b, and \(t \in \mathbb {R}_+\), \(a *b (t) = \int _0^t a(t-s)b(s)ds\) whenever the integral is well-defined. Iterating the above equation, we get for any \(n \in \mathbb {N}\)
Using the fact that \(\int w(.;\alpha ^*) = 1\), \(\vartheta ^* < 1\) and using Young’s convolution inequality we easily deduce that the second term tends to 0 as n tends to infinity, so that f is dominated by \(R *r\) where \( R := \sum _{k=0}^{+\infty } \vartheta ^{*k} w(.;\alpha ^*)^{*k}\). Note that R is finite and integrable since \(\int _0^{+\infty } R(s)ds \le 1/(1-\vartheta )\). We first prove that \(r(t) \rightarrow 0\). To do so, note that \(r(t) = \mathbb {E}\left[ \lambda ^\infty (0)\right] \int _t^{+\infty } w(u;\alpha ^*)du \rightarrow 0\) since \(w(.;\alpha ^*)\) is integrable. Now, since \(R *r(t) = \int _0^t R(s)r(t-s)ds\), and \(R(s)r(t-s)\) is dominated by \(\text {sup}_{u \in \mathbb {R}_+} r(u) R(s)\) which is integrable, we conclude by the dominated convergence theorem that \(f(t) \le R *r(t) \rightarrow 0\). Finally, we prove that \(g(t) := \mathbb {E}| U(t;\theta ^*) - U^\infty (t;\theta ^*)| \rightarrow 0\). We have
Since \(\int _{(0,t)} w(t-s;\alpha ^*) f(s)ds = \int _{(0,t)} w(s;\alpha ^*) f(t-s)ds\) and \(f(t)\rightarrow 0\), we have, again, by application of the dominated convergence theorem that \(g(t) \rightarrow 0\). \(\square \)
Lemma 5
Under \(H_0\), \(S_u^T\) defined in (30) satisfies
where W is standard Brownian motion (and convergence is in the Skorokhod space \(\mathbf {D}([0,1])\)) and where \(\varOmega \), which was defined in (18) is a positive definite matrix.
Proof
Similarly to (Clinet and Yoshida 2017, proof of Lemma 3.13) we first show that
converges in probability to \(u c^{\mathsf {T}}\varOmega c\). Introducing \(\lambda ^\infty \), \(U^\infty \) as in Lemma 4, we need to show that
Using the boundedness of \(\lambda (t;\theta ^*)^{-1}\) and \(\lambda ^{\infty }(t)^{-1}\), we have the domination
for some constant \(K>0\). By Lemma 4, we thus have \(A_t \rightarrow ^P 0\). Moreover, since by Condition 3, \(U(t;\theta ^*, \phi ^*)\) and \(U^\infty (t;\theta ^*)\) are \(\mathbb {L}^{2+\epsilon }\) bounded for some \(\epsilon >0\), and \(\lambda (t;\theta ^*)\) and \(\lambda ^{\infty }(t)\) are \(\mathbb {L}^p\) bounded for any \(p>1\), we deduce that \(\mathbb {E}|A_t| \rightarrow 0\). This, in turn, easily implies that \(\mathbb {E}|T^{-1} \int _0^{uT} A_t dt| \rightarrow 0\), and thus we get (32). By the ergodicity property of Lemma 4, we also have
where \(\varOmega = \mathbb {E}[\lambda ^\infty (0)^{-1} \partial _\psi \lambda ^\infty (0)\partial _\psi \lambda ^\infty (0)^{\mathsf {T}}]\), which proves our claim. Note that the fact that \(\varOmega \) corresponds to the limit (18) is proved in Lemma 7 below.
To prove that \(\varOmega \) is positive definite note that \(\mathbb {E}\left[ \lambda ^\infty (0)^{-1}U^\infty (0;\theta ^*)^2 \right] \) can be computed by first calculating the conditional expectations of the \(G_c(\mathbf {y}_{t_i^{\infty }}) G_c(\mathbf {y}_{t_j^{\infty }})\) terms (appearing in \(U^\infty (0;\theta ^*)^2\)) given the event times to get for any non-zero \(c \in \mathbb {R}^r\)
where \(t_i^\infty \) are jump times of \(N^\infty \). Now, under Condition 4 the quadratic form is positive almost surely as is \(\lambda ^\infty (0)^{-1}\) and hence the expectation is positive proving the claim.
Next, for Lindeberg’s condition, for any \(a>0\), similarly to Clinet and Yoshida (2017)
where we have used Condition 3 along with the boundedness of \(\lambda (t;\theta ^{*})^{-1}\). As in Clinet and Yoshida (2017), application of (Jacod and Shiryaev 2013, 3.24 chapter VIII) gives the required functional CLT. \(\square \)
Lemma 6
Under \(H_0\), we have
Proof
Rewrite
Consider the first term in (33). Recall that \(\hat{G}_c(\mathbf {x})-G_c(\mathbf {x}) = \hat{\mu }_H-\mu _H\), we have
giving
Now by Condition 4, \(\hat{\mu }_H -\mu _H\rightarrow ^P 0\). Also, using the consistency of the quasi likelihood estimates for the unmarked process, \(\hat{\vartheta } \rightarrow ^P \vartheta ^*\). Finally
is precisely the same as the derivative of the nonboosted likelihood w.r.t. the branching ratio parameter \(\vartheta \) and it converges in distribution to a normal random variable directly from (Clinet and Yoshida 2017, Proof of Theorem 3.11). Hence the first term in (33) converges to zero in probability.
Consider the second term in (33) which is written as
using a first order Taylor series expansion where \(\bar{\theta }_T \in [\theta ^*, \hat{\theta }_T]\). By the central limit theorem in Clinet and Yoshida (2017) \(\sqrt{T}(\hat{\theta }_T - \theta ^*)\) is asymptotically normal. We show that the term multiplying this converges to zero in probability using a similar argument as to that in (Clinet and Yoshida 2017, Proof of Lemma 3.12). Now, at any \(\theta \) we have
These three terms are analogous to the three terms appearing in the expression for \(\partial _\theta ^2 l_T(\theta )\) in (Clinet and Yoshida 2017, middle p. 1809) and are listed in the same order.
The third term converges in probability to zero uniformly on a ball, \(V_T\) centered on \(\theta ^*\) shrinking to \(\{\theta ^*\}\) using similar arguments to those in (Clinet and Yoshida 2017, p. 1810) for their third term and Lemma 4.
The second term also converges to a limit uniformly on a ball, \(V_T\) centered on \(\theta ^*\) shrinking to \(\{\theta ^*\}\) and uses ergodicity from Lemma 4 and similar arguments to Clinet and Yoshida (2017) but note that the limit is a matrix of zeros because its expectation is zero corresponding to the block diagonal structure of the full information matrix.
Finally consider the first, martingale term,
which we will show converges to zero in probability uniformly in \(\theta \in \varTheta \) (uniformity allow us to deal with the evaluation at \(\bar{\theta }_T\)) and use \(\mathbb {E}[\left| M_{a,T}(\bar{\theta }_T)\right| ^p] \le \mathbb {E}[\sup _{\theta \in \varTheta }\left| M_{a,T}(\theta )\right| ^p]\) where \(M_{a,T}\) is the a’th component. For \(p = dim(\varTheta )+1\)
where \(K(\varTheta ,p) <\infty \) using Sobolev’s inequality as in (Clinet and Yoshida 2017, Proof of Lemma 3.10). We next apply the Davis-Burkholder-Gundy inequality followed by Jensen’s inequality to each of \(\mathbb {E}[|M_T(\theta )|^p]\) and \(\mathbb {E}[|\partial _\theta M_T(\theta )|^p]\).
First
Similarly
Now, as in Clinet and Yoshida (2017) proof of Lemma 3.12, the processes \( |\partial _{\theta }\{\lambda (t; \theta )^{-1} U(t;\theta )\}| ^p \lambda (t;\theta ^*)^\frac{p}{2} \) and \(|\partial _{\theta }^2\{\lambda (t; \theta )^{-1} U(t;\theta )\}| ^p \lambda (t;\theta ^*)^\frac{p}{2}\) are dominated by polynomials in \(\lambda (t;\theta )^{-1}\), \(\partial _\theta ^i \lambda (t;\theta )\) and \(\partial _\theta ^i U(t;\theta )\) for \(i\in {0,1,2}\). The first two terms are covered by Clinet and Yoshida (2017) condition A2. The terms \(\partial _\theta ^i U(t;\theta )\) are covered by Condition 3 (and are shown to be true for the exponential decay model in Lemma 2). \(\square \)
Lemma 7
Under \(H_0\), The estimated information matrix, \(\hat{\mathcal {I}}_{\psi }\) defined in (17) satisfies
Proof
Recall from (17)
and let
Note that, by similar arguments to that of the proof of Lemma 3.12 in Clinet and Yoshida (2017), we have
where \(M_T\) is a martingale of order \(O_P(T^{-1/2})\). by ergodicity, we thus have that \(T^{-1}\hat{\mathcal {I}}_{\psi }(\nu ^*)\rightarrow \varOmega \) where \(\varOmega \) is the same positive definite matrix as in Lemma 5. Hence to prove Lemma 7 it is sufficient to show that \(\frac{1}{T}c^{\mathsf {T}}\{\hat{\mathcal {I}}_{\psi }-\mathcal {I}_{\psi }(\nu ^*)\}c\rightarrow 0\) for any \(c \in \mathbb {R}^r\). Let \(R(t;\theta ,\phi ) = \lambda (t;\theta )^{-1}U(t;\theta ,\phi )\). Then
Now, using a Taylor series expansion
Now \(c^{\mathsf {T}}(\hat{\mu }_H - \mu _H)\rightarrow ^P 0\) and, similarly to Clinet and Yoshida (2017), both integrals are uniformly bounded in probability for all T hence \(\frac{1}{T}c^{\mathsf {T}}\{\hat{\mathcal {I}}_{\psi }-\mathcal {I}_{\psi }(\nu ^*)\}c\) converges to zero in probability, completing the proof. \(\square \)
D proof of theorem 2
We have divided the proof of Theorem 2 into a series of Lemmas. Before we derive the asymptotic distribution of the score statistic we need some definitions. For the sake of simplicity, we will use the notation \(\lambda ^{T}(.;\theta ) := \lambda _g^{T}(.;\theta ,\phi ,0)\) (which is independent of \(\phi \in \varPhi \)). By Proposition 1, we may assume the existence of an unboosted marked Hawkes process \(N_g^{(0)}\) generated by the same measure \(\overline{N}_g\) on \(\mathbb {R}^2 \times \mathbb {X}\) as the sequence of processes \(N_g^T\). \(N_g^{(0)}\) is thus a marked Hawkes process with boost function \(g(., \phi ^*, 0) = 1\), and corresponds to the process \(N_g\) studied in the asymptotic theory under the null hypothesis. Hereafter, we use the notation to emphasize this fact. We call \(\lambda ^{(0)}\) its associated stochastic intensity, that is for any \(\theta \in \varTheta \)
\(N^{(0)} = N_g^{(0)}(\cdot \times \mathbb {X}) \) where \(\lambda ^{(0)}(t;\theta ^*)\) is the actual stochastic intensity of \(N^{(0)}\), that is \(\int _0^t \lambda ^{(0)}(s;\theta ^*) ds\) is the predictable compensator of \(N_t^{(0)}\). Finally, we define for \(i \in \{0,1\}\), \(\theta \in \varTheta \), \(\phi \in \varPhi \),
We first show that in the sense of (34) and (35) below, \(N_g^T\) is asymptotically close to \(N_g^{(0)}\) when \(T \rightarrow +\infty \).
Lemma 8
Let f be a predictable process depending on \(\theta \in \varTheta \) such that
for some \(p \ge 2\). Then we have
and for any \(i \in \{0,1\}\)
Proof
We prove our claim in three steps.
Step 1. Letting \(\delta ^T(t) = \mathbb {E}|\lambda _g^{T}(t;\theta ^*,\phi ^*,\psi _T^*) - \lambda ^{(0)}(t;\theta ^*)|\), we prove \(\sup _{t \in [0,T]}\delta ^T(t) = O(T^{-1/2})\). We have
for some constant \(K >0\), where we have used that \(\mathbb {E}[g(\mathbf {y}_s;\phi ^*,\psi _T^*) | \mathcal {F}_{s-}^{\mathbf {y}}] \le C < 1/\vartheta ^*\), that \(\int _0^{+\infty }w(.;\alpha ^*) = 1\), and Condition 5. Moreover, for a vector x, we have used the notation \(|x| = \sum _i |x_i|\). Taking the supremum over [0, T] on the left hand side, we deduce \(\sup _{s \in [0,T]} \delta ^T(s) \le KT^{-1/2}/(1-C\vartheta ^*)\) and we are done.
Step 2. Letting \(\epsilon ^T(t) = \mathbb {E}|\lambda _g^{T}(t;\theta ^*,\phi ^*,\psi _T^*) - \lambda ^{(0)}(t;\theta ^*)|^2\), we prove \(\sup _{t \in [0,T]}\epsilon ^T(t) = O(T^{-1/2})\). We have for some \(c > 0\) arbitrary small,
where we have used the inequality \((x+y)^2 \le (1+c)x^2+(1+c^{-1})y^2\) for any \(c > 0\). First, we have
Now, applying Jensen’s inequality with respect to the probability measure \(w(s;\alpha ^*)ds/\int _0^t w(s;\alpha ^*)ds\), and then using \(\int _0^{+\infty }w(.;\alpha ^*) =1\), and \(\mathbb {E}[g(\mathbf {y}_{s};\phi ^*,\psi _T^*) | \mathcal {F}_{s-}^\mathbf {y}] \le C < 1/\vartheta ^*\) yields
Now, for \(I_A\), we have
where we have used Cauchy-Schwarz inequality along with (23) and (24). Moreover, following a similar path as for Step 1, we also have that \(II \le K T^{-1}\) by (23). Thus, overall, using that \(\sup _{s \in [0,T] }\delta ^T(s) \le KT^{-1/2}\) by Step 1, we obtain for some constant \(K>0\)
and taking the supremum over [0, T] on the left hand side, taking \(c>0\) close enough to 0 and T large enough so that \(((1+c)^2 \vartheta ^{*2} C^2 + KT^{-1}) < A\) for some constant \(A <1\), we get
for some \(\tilde{K} >0\).
Step 3. We prove (34) and (35). For (34), this is a direct consequence of the fact that the compensator of \(|N_g^T - N_g^{(0)}|\) is \(\int _0^T | \lambda _g^{T}(t;\theta ^*,\phi ^*,\psi _T^*) - \lambda ^{(0)}(t;\theta ^*) |dt\), Cauchy-Schwarz inequality and the uniform condition on f. For (35), let \(i \in \{0,1\}\). We have
And from here, using Burkholder-Davis-Gundy inequality, Step 2 of this proof along with conditions (23) and (24) we deduce that the above term is dominated by \(K T^{-1/2}\) for some \(K >0\) uniformly in \(t \in \mathbb {R}_+\). \(\square \)
Lemma 9
(Consistency of \(\hat{\nu }_T\) under the local alternatives) Under \(H_1^T\), we have
Proof
The convergence of the third component is obvious, and the convergence of the second one is assumed. All we have to show is that \(\hat{\theta }_T \rightarrow ^P \theta ^*\). Let \(l_T^{(0)}(\theta ) = \int _{[0,T) \times \mathbb {X}} \log \lambda ^0(t;\theta ) N_g^{(0)}(dt \times d\mathbf {x}) - \int _0^T \lambda ^{(0)}(t;\theta ) dt\), where we recall that \(\lambda ^{(0)}\) is the stochastic intensity of \(N_g^{(0)}\). It suffices to show that uniformly in \(\theta \in \varTheta \), we have the convergence \(T^{-1} (l_T(\theta ) - l_T^{(0)}(\theta )) \rightarrow ^P 0\). But note that \(T^{-1} (l_T(\theta ) - l_T^{(0)}(\theta )) = I + II\) with
and
By (35), we immediately have that \(\mathbb {E}\sup _{\theta \in \varTheta } |\{\lambda ^{(0)}(t;\theta ) - \lambda ^{T}(t;\theta )| = O(T^{-1/2})\) uniformly in \(t \in [0,T]\), so that \(II \rightarrow ^P 0 \) uniformly in \(\theta \in \varTheta \). Writing I as the sum
we need to show that both terms tend to 0. Since \(|\log \lambda ^{(0)}(t;\theta ) - \log \lambda ^{T}(t;\theta )| \le \underline{\eta }^{-1} | \lambda ^{(0)}(t;\theta ) - \lambda ^{T}(t;\theta )|\), we easily get by Cauchy-Schwarz inequality and (35) that \(\mathbb {E}\sup _{\theta \in \varTheta } |A| \rightarrow 0\). Moreover, using \(\log \lambda ^{T}(t;\theta ) \le \lambda ^{T}(t;\theta ) - 1\), by (34) and Condition 5 we have that \(\mathbb {E}\sup _{\theta \in \varTheta } |B| \rightarrow 0\) and we are done. \(\square \)
Lemma 10
Under \(H_1^T\), we have
Proof
First, note that by application of Lemma 8, Lemma 9, and following the same path as for the proof of Lemma 6, we deduce
Next, We have
where we have used the notation \(\tilde{N}^{T}(dt) = N_g^T(dt, \mathbb {X}) - \lambda _g^{T}(t; \theta ^*,\phi ^*, \psi _T^*)dt\). We derive the limit of the first term following the same path as for the proof of Lemma 5. Letting \(S_u^T = T^{-1/2}\int _{(0,uT)} \partial _\psi \log \lambda _g^{T}(t;\theta ^*,\phi ^*,0) \tilde{N}^{T}(dt)\), we directly have that
By (35), the boundedness of moments of \(\lambda _g^{T}\) and its derivatives and Hölder’s inequality we easily deduce that
which converges in probability to \(u\varOmega \) by Lemma 5. Similarly, Lindeberg’s condition
for any \(a >0\) is satisfied, so that by 3.24, Chapter VIII in Jacod and Shiryaev (2013), we get that \(I = S_1^T \rightarrow ^d \mathcal {N}(0,\varOmega )\). Now we derive the limit for II. We have for some \(\tilde{\gamma }_T \in [0, \hat{\gamma }_T]\)
Now, using Hölder’s inequality, the uniform boundedness of moments of \(\lambda _g^{T}\) in \(\nu \), and (35), we deduce as previously that
which, by the proof of Lemma 7, tends in probability to the limit \(\varOmega \gamma ^*\). By Slutsky’s Lemma, we get the desired convergence in distribution for \(T^{-1/2}\partial _\psi l_g(\hat{\nu }_T)\). \(\square \)
Lemma 11
Under \(H_1^T\), we have
Proof
First, as for Lemma 10, note that by application of Lemma 8, Lemma 9, and following the same path as for the proof of Lemma 7, we have
Now recall that
By (34), (35), the boundedness of moments of \(\lambda _g^{T}\) and its derivatives and Hölder’s inequality we get
and by Lemma 7, the right-hand side converges in probability to \(\varOmega \). \(\square \)
Rights and permissions
About this article
Cite this article
Clinet, S., Dunsmuir, W.T.M., Peters, G.W. et al. Asymptotic distribution of the score test for detecting marks in hawkes processes. Stat Inference Stoch Process 24, 635–668 (2021). https://doi.org/10.1007/s11203-021-09245-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11203-021-09245-5