1 Introduction

In economic and financial time series, we sometimes observe sudden and large price jumps. Although they are relatively rare events, when they occur they often have significant impact on not only a single financial market but also several different markets and wider macro-economies. Several recent notable events in European and Asian countries with large jumps include the global 2008 crisis.

The standard econometric method for investigating economic and financial time series has been the statistical analysis of discrete time series. In statistical time series analysis, we often assume that the observed time series data are equally spaced realizations of stochastic processes and that the state space is \({\mathbf{R}}^d\;(d\ge 1)\) in multivariate cases. Many statistical procedures in discrete time series analysis have been developed and applied to economic and financial time series in recent decades. When we do not observe events frequently, however, traditional discrete time series modeling with continuous state space may present important limitations. For instance, it may be difficult to distinguish major contagious events from small contagious events among different financial markets across international borders.

In this paper, we propose an alternative way of investigating economic and financial events with time series data in macro-economies, i.e., the statistical analysis of marked point processes to identify and explore multivariate time series events. Although this is not a standard approach in time series econometrics, there have been applications of this methodology in statistical seismology [see Ogata (1978, 2015) and the related literature, for instance]. We will show that this approach is a useful alternative way of investigating multivariate economic and financial markets to shed new light on issues that have hitherto sometimes neglected. In particular, we propose using simultaneous Hawkes-type multivariate point process models and their applications in this study. We argue that using the simultaneous multivariate Hawkes-type point process (SHPP) models, which are a new class of multivariate point processes, it is possible to investigate the causal effects of sudden and large jumps with their magnitude. We develop a new way to measure the Granger non-causality (GNC) and instantaneous Granger non-causality (IGNC) through stochastic intensity modeling of point processes.

In econometric time series analysis, the concept of Granger causality (after Granger 1969) has become an important, well-established tool for investigating relationships among multivariate time series variables (See Hosoya et al. 2017 for the recent developments). In the econometric literature, Florens and Fougere (1996) investigated several Granger causality concepts in the framework of continuous time stochastic processes, but their formulation of the problem was incomplete because they excluded the possibility of co-jumps therein, which means the simultaneous jumps that can be observed in multivariate times series data are excluded from the outset. The problem of co-jumps is important because we often analyze economic time series data in discrete time (monthly, weekly, daily, hourly, and/or minute), but continuous stochastic processes are also salient recently in financial econometrics. We need to coherently unify discrete time series analyses and continuous stochastic processes. In this paper, we investigate the possible use of co-jumps in a systematic way and develop new GNC and IGNC tests, which may provide important insight for advancing the development and application of econometric time series modeling.

Previously, Kunitomo et al. (2017) have used the traditional multivariate Hawkes-type point process (THPP) models without co-jumps and the SHPP models are the extension of their models (that is, the THPP models are special cases of SHPP models). There are important cases when we need the SHPP models as we will illustrate in Sect. 6. In statistical seismology, researcher often uses the earthquake data of large magnitudes (greater than 3, say) and the time scale of measurement is short due to physical laws. In financial markets, however, the impacts of shocks occur in actual trades of financial commodities and digestion of bad or good news among market participants often needs time (1, 2 h, a day and days). Therefore, it may not be fruitful to use the method developed for seismology to financial problem mechanically and we need to consider the issues of time scales, discretization of statistical models and their measurements carefully.

Several recent studies in financial econometrics have utilized point processes and conditional intensity modeling (Ait-Sahalia and Jacod 2014; Ait-Sahalia et al. 2015; Embrechts et al. 2011; Grothe et al. 2014; and others). In a survey of these and other works, Bacry et al. (2015) noted that the focus therein is mostly on studies of financial micro-market structures. The approach developed in this paper is related to these works, but the main purpose is quite different because we develop a new point process approach to assess the relationships among different (international financial) markets. In this respect, there have also been studies on international linkage in financial markets (e.g., Hamao et al. 1990), but our proposed methodology is notably different because those studies utilized standard discrete time series modeling. In terms of empirical examples, we will investigate the interactions among Tokyo–New York, Tokyo–London, and Tokyo–Hong Kong financial markets and apply the GNC tests developed herein to those contexts. This yields several important findings among major financial markets.

The remainder of the paper is organized as follows. In Sect. 2 we present a general formulation of simultaneous multivariate Hawkes-type point process (SHPP) models. In Sect. 3, we describe the estimation method and develop non-causality tests in the sense of Granger (1969). In Sect. 4, we explore simulation results and the empirical applications are offered in Sect. 5. Concluding remarks are presented in Sect. 6 and the mathematical details are provided in the Appendix.

2 Simultaneous Hawkes-type point processes

We divide the observation period [0, T] into discrete periods \(I_i^n=(t_{i-1}^n,t_i^n]\;(i=1,\dots ,n)\) and set the initial time as \(t_0^n=0\). We may interpret \(I_i^n\) as the i-th day, but it is possible to use higher resolution of observation periods (e.g. hourly or per minute) and we allow the irregularly-spaced time series modeling in principle.

Let the observable price processes of Itô-semimartingale with the state space of \({\mathbf{R}}^d\) (see Ikeda and Watanabe 1989) be \(P_j(t)\;(j=1,\dots ,d;\; t_{i-1}^n<t\le t_i^n,i=1,\dots ,n)\), and in \(s\in I_i^n\) we denote the (negative) log-return of prices \( X^n_j(s) \;(s\in I_i^n )\) as

$$\begin{aligned} X_j^n(s) =-\log \left[ P_j(s)/P_j(t_{i-1}^n)\right] \quad (j=1,\dots ,d;\; i=1,\dots , n). \end{aligned}$$
(1)

Let the first stopping time when \(X_j^n(s)\) exceeds the threshold \(u_j\;(>0)\) in \(s\in I_i^n\) be \(\tau ^n(i,j,1)\). When \(\tau ^n(i,j,1)<t_i^n \), we re-define the return process \(X_j^n(s)=-\log [P_j(s)/P_j(\tau ^n(i,j,1))]\;(s\in I_i^n, s\ge \tau ^n(i,j,1))\) and let the first stopping time when \(X_j^n(s)\) exceeds the threshold \(u_j\) crossing from below in \(s\in (\tau ^n(i,j,1),t_i^n]\) be \(\tau ^n(i,j,2)\). In this way, we define the sequence of \(\tau ^n(i,j,k)\;(k\ge 1)\) successively.

Let also a sequence of numbers of jumps in an interval be \(J_j(i)=\# \{k : \tau ^n (i,j,k)\;\in I_i^n, k\ge 1\}\) and then define

$$\begin{aligned} N_j^{n*}(t, u_j)= \sum _{1\le l\le i, J_j(l)>0}\frac{1}{J_j (l)} N_j\left( I_l^n ,t, u_j \right) \quad ( t\in I_i^n ), \end{aligned}$$
(2)

where \( N_j(I_l^n ,t, u_j)\) is the number of counts that the resulting return process \(X^n_j(s)\;(s\le t\)) exceeds \(u_j\) as the threshold crossing from below in \(s\in I_l^n \) and \(s\le t\in I_i^n\). The stochastic process \(N_j^{n*}\) varies at most one in every discrete observation period.

This formulation of normalized counting in each intervals \( I_i^n \;(i=1,\dots ,n) \) allows us to measure the market impacts of financial price jumps in discrete intervals while we can use the standard statistical intensity modeling. We note that \(J_j(i)\) are countable in [0, T] (T is finite, n is sufficiently large and \(u_j >0\) are fixed constants) and the number of instantaneous jumps (i.e. \(\vert P_j(s)- P_j(s-)\vert>u_j>0\)) is finite for the price processes of Itô-semimartingales. In financial risk management and regulation, the sudden and/or large downward movements of financial commodity prices in short time intervals are the most important subject because of their negative consequences to financial markets and economies.Footnote 1

In the following analysis, we consider the situation that there is a common threshold value u for \(u_j\;(j=1,\dots ,d)\) and the observations of the counting processes are available at day-start, day-minimum and day-end in each interval. In principle, however, we can allow the irregularly-spaced time series modeling, but we need an explicit discretization of observations. For these intervals of observation, we consider the point processes, \(N_j^{n *}(t ,u)\;(j=1,\dots ,d),\) which are simple. They satisfy the standard condition for point processes that as \(\Delta t\rightarrow 0\), we have the conditions

$$\begin{aligned} P(N_{j}^{n*}(t+\Delta t,u)-N_{j}^{n*}(t,u)= & {} 1 | {\mathcal {F}}^n_{t-} ) =\lambda _{j}^{n*}(t,u) \Delta t + o_p(\Delta t),\\ P(N_{i}^{n*}(t+\Delta t,u)-N_{i}^{n*}(t,u)> & {} 1\;\mathrm{for\;any}\; i | {\mathcal {F}}^n_{t-} ) = o_p(\Delta t) ,\\ P(N_{i}^{n*}(t+\Delta t,u)-N_{i}^{n*}(t,u)= & {} 1\;\mathrm{for}\;i=j,k; j\ne k | {\mathcal {F}}^n_{t-} ) = o_p(\Delta t), \end{aligned}$$

where \({\mathcal {F}}^n_{t-}\) is the \(\sigma -\)field generated by the latest information before t. The (conditional) intensity functions are given by

$$\begin{aligned} \lambda _j^{n*} (t,u ) = \lim _{\Delta t\rightarrow 0} {\mathbf{E}}\left[ \frac{ N_j^{n*}(t+\Delta t,u)-N_j^{n*}(t,u)}{ \Delta t} \vert {\mathcal {F}}^n_{t-}\right] , \end{aligned}$$
(3)

where we use the notation \({\mathcal {F}}^n_{t-}\) as the latest information before t because we discretize the counting process and we have discrete observations. For convenience, we denote \({\mathcal {F}}_{t-}^n\) as \({\mathcal {F}}_{t}\) in the following analysis whenever such notation allows.

For expository purposes, in the following analysis, we interpret the increments of \(N^{n*}_j(s,u_k)\) as if jumps of the counting process occur at \(t_i^n ,\) the end of each interval \(I_i^n\) and we have set the threshold \(u_j=u\;(j=1,\dots ,d)\). When we consider the situation when the interval length goes to zero, i.e., \( \Delta _n t= \max _{i=1,\dots ,n}\vert t_{i}^n-t_{i-1}^n\vert \longrightarrow 0\) as \(n\longrightarrow \infty \) for a fixed T, the counting process, which is a simple point process, \(N^{n*}_j(s,u)\) weakly converges to \(N^{*}_j(s,u)\). The resulting counting process can be interpreted as a limiting continuous-time stochastic process in high-frequency asymptotics, which is not a diffusion type but a pure jump process (see Ikeda and Watanabe 1989; Ait-Sahalia and Jacod 2014; Kurisu 2018, for example).

Next, for \(u_j=u\;(j=1,\dots ,d)\) we define the point processes \(N^{n*}_{jk}(s,u)\) by the number of stopping times that \(X^n_j(s)\) exceeds \(u\;(j=1,\dots ,d)\) for a particular j, \(X^n_k(s)\) exceed \(u\;(k=1,\dots ,d; k\ne j)\) for another k, and other \(X^n_l(s)\;(l\ne j,k)\) do not exceed u crossing from below by time s in the interval \(I_i^n\). By this construction, we can introduce the point processes \(N_{jk}^{n*}(t,u)\) with co-jumps of \(N_j\) and \(N_k\) by

$$\begin{aligned} P(N_{j}^{n*}(t+\Delta t,u)-N_{j}^{n*}(t,u)= & {} N_{k}^{n*}(t+\Delta t,u)-N_{k}^{n*}(t,u)=1 | {\mathcal {F}}_{t} )\\= & {} \lambda _{jk}^{n*}(t,u) \Delta t + o_p(\Delta t),\\ P(N_{i}^{n*}(t+\Delta t,u)-N_{i}^{n*}(t,u)> & {} 1\;\mathrm{for\;any}\;i | {\mathcal {F}}_{t} ) = o_p(\Delta t),\\ P(N_{i}^{n*}(t+\Delta t,u)-N_{i}^{n*}(t,u)= & {} 1\;\mathrm{for}\quad i=j,k,l; j\ne k \ne l | | {\mathcal {F}}_{t} ) = o_p(\Delta t), \end{aligned}$$

where \( \lambda _{jk}^{n*} (t,u )\) are the conditional intensity functions of co-jumps.

Then, when we have co-jumps of two point processes, we can define the point processes

$$\begin{aligned} N^{n}_{j}(s,u)=N^{n*}_{j}(s,u)+ \sum _{k\ne j}N^{n*}_{j k}(s,u) \;(j,k=1,\dots ,d) \end{aligned}$$
(4)

and the corresponding conditional intensity functions are given by

$$\begin{aligned} \lambda _j^{n} (t,u) =\lambda _j^{n*} (t,u)+ \sum _{k\ne j }\lambda _{j k}^{n*} (t,u). \end{aligned}$$
(5)

The resulting point processes can be interpreted as the marginal point process for the j-th component of the vector point process \({\mathbf{N}}^{n}(s,u)\) with d dimension.

By extending this formulation to more complex co-jumps, in general we define

$$\begin{aligned} N^{n}_{j}(s,u)= \sum _{J_j\in (1,\dots ,d)} N^{n*}_{J_j}(s,u) \;\;\;(j=1,\dots ,p), \end{aligned}$$
(6)

where the index set \(J_j=\{ j_1,\dots ,j_l\}\in \{1,\dots ,d\}\) is a subset of \((1,\dots , d)\). The index sets are defined as \(J_i=\{ i\}\) for \((i=1,\dots ,d),\) \(J_i=\{ 1,1+(i-d)\}\) for \((i=d+1,\dots ,2d-1),\dots ,\) and \(J_p=\{ 1,\dots ,d\}\).

Then we sequentially define \(N_{i}^{n}(s,u)=N_{i}^{n*}(s,u)\;(i=1,\dots ,d);\) \(N_{d+1}^{n}(s,u)=N_{1,2}^{n*}(s,u),\dots ,\) and \(N_{p}^{n}(s,u)=N_{1,\dots , d}^{n*}(s,u)\).

We use the self-exciting form of conditional intensity functions \(\lambda _{J_j^{n*}}( \cdot )\) for co-jumps as \(\lambda _{j k}^{n*}(t, x|\mathcal {F}_{t-}^{n})\) in the same way and the marginal conditional intensity function for the \(j-\)th components as

$$\begin{aligned} \lambda _j^{n} (t,u) =\sum _{J_j\in (1,\dots ,d)} \lambda _{J_j}^{n*} (t,u). \end{aligned}$$
(7)

There is a one-to-one transformation between \(N^{n}_{j}(s,u)\) for \(j=1,\dots ,p\) and \(N^{n*}_{j_1,\dots , j_k}(s,u)\) for \(1\le k\le d\) and between \(\lambda _j^{n} (t,u )\) and \(\lambda _{j_1,\dots ,j_k}^{n*} (t,u ) \) for \(p=2^d-1\).

The self-exciting Hawkes-type conditional intensity functions for the marked point processes are given by

$$\begin{aligned} \lambda _{j}^{n}\left( t, x | \mathcal {F}_{t-}^{n}\right) = \lambda _{j,0} +\sum _{i=1}^{p}\int _{-\infty }^{t}c_{ji}(x)g_{i}(t-s) N^{*n}_{J_i}({\mathrm{d}}s \times {\mathrm{d}}x) \end{aligned}$$
(8)

for \(j=1,\dots , p\), where \( N^{*n}_{J_i} ({\mathrm{d}}s \times {\mathrm{d}}x)\) are the marked point processes, \(\lambda _{j,0}\) are the initial intensities, \(g_{i}(t-s)=e^{-\gamma _{i}(t-s)} \) are the damping functions, and \(C(X)=(c_{ji}(x))\) are the impact functions.

Since we are interested in sudden and large jumps of the underlying price processes (in the sense of negative returns), it is important to use the probability functions of the return process in the tail areas. In this respect, many empirical studies on the stock markets found that stock returns exhibit the non-Gaussianity and thick tails. Hence it may be appropriate to use the generalized Pareto distributions (GPD) as tail probability functions for \(x>u >0\;(j=1,\dots ,d)\) as

$$\begin{aligned} P(X_j^n(s)> x\vert X_j^n(s) >u, \mathcal{F}_s)= & {} \frac{ \left[ 1+\frac{\xi _j}{ \sigma _j } x\right] ^{-1/\xi _j} }{ \left[ 1+\frac{\xi _j}{\sigma _j } u\right] ^{-1/\xi _j} }\nonumber \\= & {} \left[ 1+\frac{\xi _j}{\sigma _j^{*} }(x-u) \right] ^{-1/\xi _j}, \end{aligned}$$
(9)

and we set \(\sigma _j^{*} = \xi _j u_j + \sigma _j\) (\(\sigma _j >0\))

(See Resnick 2007 for the details of GPD in statistical extreme value theory (SEVT)).

Herein, we assume that given the return at s \(X_j^n(s)\;(j=1,\dots ,d) \), the conditional density functions are given by

$$\begin{aligned} f_j (x, s) = \frac{1}{\sigma _j^{*}} \left[ 1+\frac{\xi _j}{\sigma _j^{*}}(x-u) \right] ^{-1/\xi _j-1}\;(x>u, \xi _j>0). \end{aligned}$$
(10)

In terms of impact functions and the intensity function of co-jumps, there can be many possible specifications. In our empirical study, we mainly investigate the form

$$\begin{aligned} c_{ij}(X)=\left( a_{ij}x_j^c\right) \;(0\le c\le 1;\; i=1,\dots ,p;\; j=1,\dots ,d) \end{aligned}$$

and

$$\begin{aligned} c_{ij}(X)=\left( a_{ij} \max _{k\in J_i}x_k^c\right) \;(0\le c\le 1;\; i=1,\dots ,p;\; j=d+1,\dots ,p), \end{aligned}$$

where \(a_{ij}\) and c are some constants.

In particular when \(p=d\) and \(c_{ij}=\delta (i,j)\) (indicator functions), they correspond to the traditional multivariate Hawkes-type (THPP) processes, which are simple point processes without co-jumps.

Let \(p\times 1\) vector point process \({\mathbf{N}}^n(t,u)\) be partitioned as \((d+(p-d))\times 1\) processes as

$$\begin{aligned} {\mathbf{N}}^n(t ,{\mathbf{u}})= \left[ \begin{array}{c} {\mathbf{N}}^n_1(t,u)\\ {\mathbf{N}}^n_2(t,u) \end{array} \right] \; = \left[ \begin{array}{c} N_1^n(t,u)\\ \vdots \\ N_d^n(t,u)\\ N^n_{1,2}(t,u)\\ \vdots \\ N^n_{1,2,\dots ,d}(t,u) \end{array}\right] , \end{aligned}$$
(11)

where \({\mathbf{N}}^n_1(t,u)\) is the \(d\times 1\) vector of marginal point processes with \(p=2^d-1\) and \({\mathbf{N}}^n_2(t,u)\) is the \((p-d)\) vector of co-jump point processes. The corresponding conditional intensity functions as

$$\begin{aligned} {\lambda }^n(t,{\mathbf{u}})= \left[ \begin{array}{c} {\lambda }_1^n(t,u)\\ {\lambda }_{2}^n(t,u) \end{array}\right] =\left[ \begin{array}{c} \lambda _1^n(t,u)\\ \vdots \\ \lambda _d^n(t,u)\\ \lambda _{1,2}^n(t,u)\\ \vdots \\ \lambda _{1,2,\dots ,d}^n(t,u) \end{array}\right] , \end{aligned}$$
(12)

and \(p\times p\) matrices

$$\begin{aligned} {\mathbf{C}}(X(s-))=\left[ c_{ij}(X_{s-}) \right] ,\; {\mathbf{G}}(t-s)=\left[ \mathrm{diag} ( g_{j}(t-s)) \right] . \end{aligned}$$

We use notation such that \({\lambda }_1^n(t,u) \) is the vector process of conditional intensities of marginal jumps, \(\mathrm{diag}(\cdot )\) for diagonal matrices and we often omit n for \(\lambda _{J_i}^n (s) \;(i=1,\dots ,p)\) and \(N_{J_i}^n\) whenever their meanings are clear in the following analysis.

Next, we rewrite (6) and (7) as

$$\begin{aligned} {\mathbf{N}}_1^n(t,u) = {\mathbf{D}}_1 {\mathbf{N}}^n(t,u), \end{aligned}$$
(13)

and

$$\begin{aligned} {\mathbf{N}}_2^n(t,u) = {\mathbf{D}}_2 {\mathbf{N}}^n(t,u), \end{aligned}$$
(14)

where \({\mathbf{D}}_1\) is a \(d\times p\) matrix as

$$\begin{aligned} {\mathbf{D}}_1 = \left[ \begin{array}{cccccccccc} 1&{}0&{}\cdots &{}0&{}1&{}1&{}\cdots &{}0&{}\cdots &{}1\\ 0&{}1&{}\cdots &{}0&{}1&{}0&{}\cdots &{}0&{}\cdots &{}1\\ \vdots &{} &{} 1&{}0 &{}0 &{} &{} &{}\vdots &{}\cdots &{}1\\ 0&{}\cdots &{}\cdots &{}1&{}0&{}\cdots &{}\cdots &{}\cdots &{}1&{}1 \end{array}\right] \; \end{aligned}$$

and \({\mathbf{D}}_2\) is a \((p-d)\times p\) matrix as \( {\mathbf{D}}_2= \left[ {\mathbf{O}}, {\mathbf{I}}_{p-d} \right] \;(p\ge d). \)

We call the above Hawkes-type conditional intensity models as the simultaneous multivariate Hawkes-type point process (SHPP) models because the resulting marked point processes are not necessarily simple.Footnote 2 The classical Hawkes-type point processes have been useful in applications because they are simple point processes. However, they exclude the possibility of simultaneous jumps or co-jumps and they are not fit for our purposes here. The foregoing constructions of marked point processes can be regarded as an extension of Solo (2007).

3 Stationarity and Bartlett spectrum decomposition

3.1 Stationarity of Hawkes-type processes

In our applications, we use stationary self-exciting Hawkes-type (marked) point processes. We take the expectation of the intensity function of (11) and (12) in \((-\infty ,t]\) as

$$\begin{aligned} {\mathbf{E}}[{\lambda }^n(t,{\mathbf{u}})] ={\lambda }_0 +{\mathbf{E}}\left[ \int _{-\infty }^{t} {C}(\mathbf{X}(s-)){\mathbf{G}}(t-s) {\mathrm{d}}{\mathbf{N}}^n(s,{\mathbf{u}})\right] , \end{aligned}$$
(15)

where \({\lambda }_0=(\lambda _{j,0})\).

Let a \(p\times 1\) vector of functions \({\mathbf{v}}(t)=(v_j(t))\) be \({\mathbf{E}}[{\lambda }^n(t,{\mathbf{u}})]\). For simplicity, we take \({\mathbf{G}}(t-s)=[\mathrm{diag} (g_j(t-s))]\) with \(g_j(t-s)=e^{-\gamma _j (t-s)}\) and \({\mathbf{G}}(0)={\mathbf{I}}_p,\; {\Gamma }=[\mathrm{diag} (\gamma _j)]\;(\gamma _j>0, j=1,\dots ,p)\).

The stationarity implies \({\mathbf{C}}={\mathbf{E}}[{\mathbf{C}}(\mathbf{X}(s-)]\) and we can use the identity relation \({\mathbf{v}}(t)-{\lambda }_0 =\int _{-\infty }^t {\mathbf{C}}{} {\mathbf{G}}(t-s){\mathbf{v}}(s){\mathrm{d}}s\). Then by a direct calculation, we have a set of differential equations

$$\begin{aligned} \frac{ {\mathrm{d}}{} {\mathbf{v}}(t)}{{\mathrm{d}}t} =\left[ {\mathbf{C}} -{\mathbf{C}}{\Gamma }{\mathbf{C}}^{-1}\right] {\mathbf{v}}(t) +{\mathbf{C}}{\Gamma }{} {\mathbf{C}}^{-1}{\lambda }_0, \end{aligned}$$
(16)

provided the initial condition \({\mathbf{v}}(0)\) and the non-degeneracy condition \(\vert {\mathbf{C}}\vert \ne 0\).

We need a condition for the convergence of \({\mathbf{v}}(t)\) as \(t\rightarrow \infty \). Therefore, the condition for the existence of stationary point processes is that the spectral radius

$$\begin{aligned} \max _{1\le i\le p} \vert \mu _i ({\mathbf{F}}) \vert <1, \end{aligned}$$
(17)

where \( \mu _i ({\mathbf{F}}) \) are the characteristic roots of \({\mathbf{F}}={\mathbf{C}}{\Gamma }^{-1}\). (See Theorem 2 of Kunitomo et al. 2017 as a special case.) When \(d=p=1 \) (one-dimensional Hawkes process), \({\mathbf{C}}=\alpha \) and \({\Gamma }=\gamma \;(>0),\) then \({\mathbf{F}}=\alpha / \gamma \).

3.2 Applying the Bartlett spectrum

Hawkes (1971) introduced the spectral density for the stationary vector point process \({\mathbf{N}}(t)=(N_i(t))\), which was originally developed by Bartlett (1963) without co-jumps (\(p=d\)); it is defined for the conditional intensity vector in the form of

$$\begin{aligned} {\lambda }(t)={\lambda }_0 +\int _{-\infty }^t {\gamma }(t-u) {\mathrm{d}}{} {\mathbf{N}}(u), \end{aligned}$$
(18)

where \({\gamma }(u)=(\gamma _{ij}(u))\) is a \(d\times d\) matrix and \({\gamma }(u)=(0)\) (zero-matrix) for \(u<0\). Let the Fourier transform of \({\gamma }(\tau )\) be

$$\begin{aligned} {\Gamma }^{*}(\omega )=\int _{-\infty }^{\infty } e^{-i\omega \tau } {\gamma }(\tau ){\mathrm{d}}\tau , \end{aligned}$$
(19)

where \(i^2=-1\).

Then, when \(p=d\) (there are no co-jumps), the Bartlett spectral matrix for frequency \(\omega \;(\in {\mathbf{R}})\) is given by

$$\begin{aligned} {\mathbf{g}}(\omega ) =\frac{1}{2\pi }[ {\mathbf{I}}_d-{\Gamma }^{*}(\omega )]^{-1} {\Sigma } [{\mathbf{I}}_d-{\Gamma }^{*'}(-\omega )]^{-1}, \end{aligned}$$
(20)

where \({\Gamma }^{*}\) in (20) is a \(d\times d\) matrix for the d-dimensional vector point process and \({\Gamma }^{*'}\) is the transposed matrix of \({\Gamma }^{*}\).

To permit co-jumps, Bartlett spectral matrix for the d-dimensional marginal point process vector can be defined by

$$\begin{aligned} {\mathbf{g}}(\omega ) =\frac{1}{2\pi } [{\mathbf{I}}_d,{\mathbf{O}}] [{\mathbf{D}}-{\mathbf{D}}{\Gamma }^{*}(\omega )]^{-1}{\Sigma } [{\mathbf{D}}^{'}-{\mathbf{D}}^{'}{\Gamma }^{*'}(-\omega )]^{-1} \left[\begin{array}{c} {\mathbf{I}}_d\\ {\mathbf{O}} \end{array}\right], \end{aligned}$$
(21)

where \({\mathbf{g}}(\omega )=(g_{ij}(\omega ))\) is the \(d\times d\) spectral density matrix, \( {\Gamma }^{*} (\omega ) \) is a \(p\times p\) Fourier transform as (19), \({\mathbf{D}}=({\mathbf{D}}_1^{'},{\mathbf{D}}_2^{'})^{'}\) is a \(p\times p\) choice matrix, and \({\Sigma }=(\sigma _{ii}) \) is the diagonal matrix with diagonal elements of the variances \(\sigma _{ii}\;(i=1,\dots ,p)\).

Then we define the relative power contribution (RPC) of the marginal spectral density function \(g_{ii}(\omega )\;(i=1,\dots ,d)\) where the frequency \(\omega \) can be defined using the joint spectral density matrix \({\mathbf{g}}(\omega )\). The (i,i)-component of \({\mathbf{g}}(\omega )\) can be represented as

$$\begin{aligned} g_{ii}(\omega )= \sum _{k=1}^p \vert a_{ik}(\omega )\vert ^2\sigma _{kk}\; \end{aligned}$$
(22)

and

$$\begin{aligned} {\mathbf{RPC}}_{k\rightarrow i}(\omega ) =\frac{\vert a_{ik}(\omega )\vert ^2\sigma _{kk}}{g_{kk}(\omega )}\;\; (i=1,\dots ,p;\;k=1,\dots ,d), \end{aligned}$$
(23)

where \(a_{ij}(\omega )\;(i=1,\dots ,d;\; j=1,\dots ,p)\) are the functions of complex variables. In addition, the instantaneous RPC (\({\mathrm{IRPC}}_{j\rightarrow i}\)) can be defined by

$$\begin{aligned} \mathbf{IRPC}_{j\rightarrow i}(\omega ) =\frac{\vert a_{ij}(\omega )\vert ^2\sigma _{jj}}{g_{ii}(\omega )} \;(j=d+1,\dots ,p). \end{aligned}$$
(24)

In this way, we can measure the RPCs for any frequency \(\omega \), which corresponds to the Granger-causality measures in the frequency domain. One important aspect of the above formulation is that we have a natural definition of instantaneous Granger causality in the frequency domain, that is different from discrete time series modeling.

3.3 Conditional probability prediction

An important application of conditional intensity modeling involves assessing the conditional probability of rare events in the future from past observations. Let \(\tau (j)\;(j=1,\dots ,d)\) be the first arrival time of an event in the \(j-\)th variable. Then we can write the probability of the random variable \(\tau (j)\) as

$$\begin{aligned} Pr(\tau (j) \ge T^{'} \vert {\mathcal {F}}_{T}^N) = \exp \left( -\int _{T}^{T^{'}} {\lambda }_{j}^n (t,u \vert {\mathcal {F}}_{T}^N ) {\mathrm{d}}t\right) , \end{aligned}$$
(25)

where \({\mathcal {F}}_{T}^N \) is the \(\sigma -\)field of information available at time \(T<T^{'}\) and \( {\lambda }_{j}^n (t,u \vert {\mathcal {F}}_{T}^N )\) is the conditional intensity of the j-th variable.

Kunitomo et al. (2017) conducted some experiments suggesting that useful information on the conditional probability of future events can be extracted from past observations. For instance, they provided an important example vis-\({\grave{a}}\) -vis the conditional probability prediction of the Lehman Shock occurred in 2008 as a global crisis given past information available before that event. This illustrates the potential value of our approach.

4 Estimation and non-causality tests

4.1 Likelihood function

When the point process is simple and there is no co-jump, the log-likelihood function of the (d-dimensional) multivariate point process is known (see Daley 2003; Kunitomo et al. 2017) and it is given by

$$\begin{aligned} \sum _{i=1}^{d}\left\{ -\int _{0}^{T} \lambda _{i}^n(s){\mathrm{d}}s + \int _{0}^{T}\log (\lambda _{i}^n(s)){\mathrm{d}}N_{i}^n(s)\right\} . \end{aligned}$$
(26)

The log-likelihood function of the marked multivariate point process with the density function \(f_i(x)\) is given by

$$\begin{aligned} L_T = L_{1T}+L_{T2}, \end{aligned}$$
(27)

where

$$\begin{aligned} L_{1T}= & {} \sum _{i=1}^{d}\left\{ -\int _{0}^{T} \lambda _{i}^n(s){\mathrm{d}}s + \int _{0}^{T}\log (\lambda _{i}^n(s)){\mathrm{d}}N_{i}^n(s) \right\} ,\\ L_{2T}= & {} \sum _{i=1}^{d}\left\{ \int _{0}^{T} \log f_{i}(x_{i}^n(s-)){\mathrm{d}}N_{i}^n (s)\right\} \end{aligned}$$

and the density function for the tail probability is given by

$$\begin{aligned} f_{i}(x) = \frac{1}{\sigma _{i}^{*}} \left( 1+\xi _{i} \frac{x_{i}-u_{i}}{\sigma _{i}^{*}}\right) ^{-\frac{1}{\xi _{i}}-1}\; (i=1,\dots ,d). \end{aligned}$$
(28)

Then we can apply the maximum likelihood (ML) method to \(L_{1T}\) and \(L_{2T}\) separately. In this formulation we use the GPD for the marginal distribution of return process.

When co-jumps are permitted, the log-likelihood function of the (d-dimensional) marginal point process is not as per the above form; instead, it should be given by

$$\begin{aligned} L_T^{*}=L_{1T}^{*}+L_{2T}^{*}, \end{aligned}$$
(29)

where

$$\begin{aligned} L_{1T}^{*}= & {} \sum _{i=1}^{d}\left\{ -\int _{0}^{T} \lambda _{i}^n(s){\mathrm{d}}s + \int _{0}^{T}\log (\lambda _{i}^n(s)){\mathrm{d}}N_{i}^n(s)\right\} \\&+\sum _{i\ne j=1}^{d}\left\{ -\int _{0}^{T} \lambda _{ij}^n(s){\mathrm{d}}s + \int _{0}^{T}\log (\lambda _{ij}^n(s)){\mathrm{d}}N_{ij}^n(s)\right\} \\&+\cdots +\left\{ -\int _{0}^{T} \lambda _{i\dots d}^n(s){\mathrm{d}}s + \int _{0}^{T}\log (\lambda _{i\dots d}^n(s)){\mathrm{d}}N_{i\dots d}^n(s)\right\} . \end{aligned}$$

and \(L_{2T}^{*}=L_{2T}\).

In our applications, we are principally concerned with the case in which \(d=2\); thus, there is only one extra term in the likelihood function because \(p=2^d-1\).

We assume the stationarity condition (17) and the existence of second order moments of \({\mathbf{C}}(\mathbf{X})=c_{ij}(\mathbf{X}(s))\) in the statistical inference of Hawkes-type point processes without and with co-jumps. Further, we take \({\lambda }({\mathbf{u}})\) as the stationary conditional intensity and some \(q\times p\) predictable processes \({\xi }(t)\) having second order moments. (Here \(q\ge 1\) and we utilize the notation \({\mathbf{g}}_T(t)\) in Appendix, for instance.)

Then, because of the resulting martingale property given the information available at each time, it is straightforward to confirm the asymptotic properties as we have

$$\begin{aligned} \frac{1}{T}\int _0^T{\xi }(t)[{\mathbf{N}}(t,u)-{\lambda }(t, {\mathbf{u}})]{\mathrm{d}}t \longrightarrow 0\;\;(a.s.) \end{aligned}$$
(30)

and

$$\begin{aligned} \frac{1}{T}\int _0^T{\xi }(t)[{\lambda }(t,u)-{\lambda }({\mathbf{u}})]{\mathrm{d}}t\,\, {\mathop {\longrightarrow }\limits ^{p}}\,\, 0 \end{aligned}$$
(31)

as \(T\rightarrow \infty \).

For the one-dimensional point processes with the stationary intensity function with \(p=q=1\), Ogata (1978) gave a set of sufficient conditions for the consistency and asymptotic normality of the ML estimation. His derivations are based on a martingale central limit theorem (MCLT), and it is straightforward to extend his arguments to the multi-dimensional case. For the sake of completeness, we provide details of our approach based on a new MCLT in the Appendix, which may be more general than the standard literature. In the next subsection, we develop new non-causality tests in the sense of Granger, which are explored in the context of our empirical applications.

4.2 Non-causality tests

We develop and use novel GNC tests based on the likelihood ratio principle for the Hawkes-type point processes. In particular, our results in this subsection, whose derivations are given in the Appendix, include not only the multivariate extension of existing results, but also cases in which the resultant limiting Fisher information matrix can be random variables. We first state our results for the case of no co-jumps under a set of regularity conditions, which will be extended to the more general case. The proof is lengthy, but often along the standard line of asymptotic arguments and we only give its outline in Appendix.

Theorem 1

Let the log-likelihood function of the Hawkes-type point processes with true parameters be \( L_T({\theta }_0)\) in (26) and (27), the log-likelihood function with the ML estimator \({\hat{\theta }}_{ML}\) be \( L_T({\hat{\theta }}_{ML})\) under \({\theta }\in {\Theta }\) and the log-likelihood function with the restricted maximum likelihood estimator \({\hat{\theta }}_{RML}\) be \( L_T({\hat{\theta }}_{RML})\) under \({\theta }\in {\Theta }_1\) (\({\Theta }_1\subset {\Theta }\)). We assume the sufficient condition for stationarity, the existence of the second-order moment condition of \({\mathbf{C}}(\mathbf{X})\), and we assume that the parameter spaces \({\theta }\in {\Theta }\) in \({\mathbf{R}}^r\) and \({\theta }\in {\Theta }_1\) in \( {\mathbf{R}}^{r_1}\;(0\le r_1<r)\) are compact sets. Under a set of regularity conditions (see Theorem A-3 in the Appendix), as \(T\rightarrow \infty \),

$$\begin{aligned} 2\left\{ L_{T}({\hat{\theta }}_{ML})- L_{T}({\hat{\theta }}_{RML})\right\} {\mathop {\rightarrow }\limits ^{d}} \chi (r-r_1), \end{aligned}$$
(32)

where \(r-r_1\) is the number of restrictions of \(\theta =(\theta _k)\) and \( \chi ^2 (r-r_1)\) is the \(\chi ^2-\)random variable with \(r-r_1\) degrees of freedom.

The details of a set of regularity conditions are discussed in the Appendix. When co-jumps are permitted in the Hawkes-type processes, we cannot apply Theorem 1, but it is important to obtain the corresponding results in such cases for econometric applications. When we use discrete versions of point processes, which would be often the case in econometric applications, we need to consider the existence of co-jumps. We then develop non-causality tests based on the likelihood ratio principle. In this respect, note that in our setting discussed in Sect. 2, although we permit co-jumps, it is possible to apply the martingale central limit (MCLT) theorem for point processes. Our results, in consideration of co-jumps, are an extension of Theorem 1. The proof is lengthy, but often along the standard line of asymptotic arguments and we only give its outline in Appendix.

Theorem 2

Let the log-likelihood function of the Hawkes-type point processes with true parameters be \( L_T^{*}({\theta }_0)\) in (29), the log-likelihood function with the ML estimator \({\hat{\theta }}_{ML}\) be \(L_T^{*}({\hat{\theta }}_{ML})\) under \({\theta }\in {\Theta }\) and the log-likelihood function with the restricted ML estimator \({\hat{\theta }}_{RML}\) be \(L_T^{*}({\hat{\theta }}_{RML})\) under \({\theta }\in {\Theta }_1\) (\({\Theta }_1\subset {\Theta }\)). We assume sufficient conditions for stationarity, and the existence of the second-order moment condition of \({\mathbf{C}}(\mathbf{X}),\) and we assume that the parameter spaces \({\Theta }\in \theta \) in \({\mathbf{R}}^r\) and \({\Theta }_1\in \theta \) in \( {\mathbf{R}}^{r_1}\;(0\le r_1<r)\) are compact sets. Under a set of regularity conditions (see Theorem A-3 in Appendix), as \(T\rightarrow \infty \),

$$\begin{aligned} 2\left\{ L_{T}^{*}({\hat{\theta }}_{ML}) -L_{T}^{*}({\hat{\theta }}_{RML})\right\} {\mathop {\rightarrow }\limits ^{d}} \chi (r-r_1), \end{aligned}$$
(33)

where \(r-r_1\) is the number of restrictions of \(\theta =(\theta _k)\) and \( \chi ^2 (r-r_1)\) is the \(\chi ^2-\)random variable with \(r-r_1\) degrees of freedom.

5 Simulations

To examine the relevance of the estimation and testing procedure proposed in this paper, a set of simulations are executed. The model used in these simulations is a simultaneous Hawkes-type model with two dimension and the intensity functions are given by

$$\begin{aligned} \lambda _{1}^n(t)= & {} \lambda _{10}^n + \int _0^t \alpha _{11} e^{-\gamma (t -s)} X_{1} {\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{12} e^{-\gamma (t -s)} X_{2}{\mathrm{d}}N_{2}^n(s)\\&+\int _0^t \alpha _{13} e^{-\gamma (t -s)} \left[ \max _i X_{i} \right] {\mathrm{d}}N_{12}^n(s),\\ \lambda _{2}^n(t)= & {} \lambda _{20}^n + \int _0^t \alpha _{21} e^{-\gamma (t -s)} X_{1}{\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{22} e^{-\gamma (t -s)} X_{2}{\mathrm{d}}N_{2}^n(s)\\&+\int _0^t \alpha _{23} e^{-\gamma (t -s)} \left[ \max _i X_{i} \right] {\mathrm{d}}N_{1,2}^n(s),\\ \lambda _{12}^n(t)= & {} \lambda _{12,0}^n + \int _0^t \alpha _{31} e^{-\gamma (t -s)} X_{1}{\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{32} e^{-\gamma (t -s)} X_{2}{\mathrm{d}}N_{2}^n(s)\\&+\int _0^t \alpha _{33} e^{-\gamma (t -s)} \left[ \max _i X_{i} \right] {\mathrm{d}}N_{1,2}^n(s), \end{aligned}$$

where \(\lambda _{1,0}^n>0, \lambda _{2,0}^n>0, \lambda _{12,0}^n>0\) and \(\gamma >0\).

We first generate stock price returns using the GPD as marginal and the two-dimensional Gaussian copura. Then we employ ML method to obtain estimates of the underlying parameters. We provide a set of visualization (Figs. 1, 2, 3 and 4) to illustrate the key results on the finite sample distributions of the ML estimator. All histograms are standardized for the comparison of the standard normal distributions as

$$\begin{aligned} {\mathbf{I}}_n^{1/2}( {\hat{\theta }}-{\theta }), \end{aligned}$$
(34)

where \({\theta }=(\theta _i)\) is a vector of parameters and \({\hat{\theta }}\) is the ML estimator.

In our numerical evaluations, the values of estimate sometimes hit the boundaries of the non-negativity of intensity functions with finite samples, resulting in instabilities. To mitigate against this, we thus set non-negativity restrictions on parameters in our simulations. The ensuring results are reasonable, but sometimes we observe that the ML estimators of coefficients exhibit biases, although they are not very large (Fig. 2 is a typical example of this). The sample size in our experiments was around 1000 because it may be similar to the data size in the empirical examples and it seems that we need large number of data for reducing these biases. We summarize the configuration of our numerical experiments: the number of replication of simulations as 100, and for GPD(\(\sigma _{j}\), \(\xi _{j}\)) we set \((\sigma _{1}, \xi _{1}) = (0.007, 0.22)\), and \((\sigma _{2}, \xi _{2}) = (0.008, 0.15)\). These numerical values give reasonable results, and they are based on the preliminary estimates from our empirical studies.

Table 1 Simulation results
Fig. 1
figure 1

\(\alpha _{12}^{*}\)

Fig. 2
figure 2

\(\alpha _{21}^{*}\)

Fig. 3
figure 3

\(\alpha _{23}^{*}\)

Fig. 4
figure 4

\(\gamma ^{*}\)

Among many simulations we illustrate key results in Table 1 and Figures. Note that because we have taken \(\alpha _{12}^{*}=0\), we have a sampling distribution around zero, and the resulting estimate is not significant as illustrated in Fig. 1. Other estimates of \(\alpha _{ij}\), which are around their true values, take reasonable values on average in the sense that they are not significantly different from the true values, and the sampling distributions are illustrated in Figs. 2, 3 and 4. We observe some positive biases on the estimates of \(\alpha _{ij}\) and negative biases on the estimates of initial intensities, which may be due to the results of the non-negative constraints of the parameter restrictions and the number of sample size we employed. We have imposed the non-negativities of the intensities of variables directly in the ML computation.

In the ML estimation, there can be some effects of initial conditions and we have investigated this problem in the SHPP models, where such sensitivity is also apparent, albeit minor in overall simulations.

We also use the \(\chi ^2\)-distributions as the limiting distributions of the likelihood ratio statistics for hypothesis testing in our empirical study. We confirm that the \(\chi ^2\)-approximations with finite samples are basically appropriate.

6 Empirical applications

In this section, we report the empirical results on two empirical examples using the SHPP and THPP models. The first concerns the three major stock markets, namely, Tokyo, New York, and London. Since time differences exist when each market is open and closed, it is reasonable to assume that there are no co-jumps. In terms of the second example, we focus on analyzing the simultaneous interaction among Tokyo and Hong Kong financial markets. In this latter case, since the time zone differences are small (just a 1 h difference) compared to the first empirical example, it may be natural to use SHPP, which is the extended Hawkes-type point process model with co-jumps. Because of the limitations of data available to us, we have ignored the possibilities of crossing the threshold from below except the first one in a day, or between the day-start to day-minimum.

In the first example, daily data of day-start to day-minimum data are employed covering Nikkei225, S&P500 and FTSE100 during January 2, 1990–August 26, 2015. We choose \(u=2\%\) based on the earlier study of Kunitomo et al. (2017), which used the formulation of discrete process of returns and analyzed daily data of day-start to day-end for this case. Their empirical results were quite similar to those in the following analysis, but the numerical values are different. All computations were carried out by the original programs written in R. Example 2, which concerns the Tokyo and Hong-Kong markets, is entirely new and is the principal driver of the SHPP models developed in this study. We will report the results for Example 2 using this type of data. Nonetheless, we have done robustness check of our results on the estimation of conditional intensity modeling and non-causality tests. We omit reporting the details of some results using the day-start to day-end data because they are basically quite similar.

6.1 Example 1 (Tokyo–NY–London)

We first maximize the likelihood \(L_{2T}\) to estimate the marginal distributions of financial market returns. As shown in Table 2, we confirmed that the marginal distributions of market returns (i.e., log-returns) have thicker tails than the normal distribution. It is because the estimates of \({\xi }_i \;(i=1,2,3)\) are positive and it is appropriate to use the GDP in our estimation procedure. The result means that the Frechet-type tail distribution is appropriate since the domain of attraction of Gaussian distribution is Gumbel (see Embrechts et al. 1998 Chapter 3 for instance). The standard deviations (SD) (or the standard errors of the estimates) in Tables are estimated by the numerical evaluation of the Fisher information matrix.

Table 2 Tail distributions

For the estimated models with two dimensions (\(d=p=2\)), we take the impact functions c(x) as Model 1\((\; c(x)=1,\) Model 2\((\; c(x)=x,\) and Model 3\(\; c(x)=x^{c}\;(0<c<1)\). The estimated values of the log-likelihood and Akaike information criterion (AIC) are those with the marginal distribution \(L_{1T}\). The full likelihood can be calculated using \(L_{1T}\) and \(L_{2T}\). The standard deviations (SD) or standard errors of the estimated coefficients are also evaluated numerically using the inverse of the estimated Fisher information matrix.


Model 1


We estimated the intensity function as

$$\begin{aligned} \lambda _{1}^n(t) = \lambda _{10}^n + \int _0^t \alpha _{11} e^{-\gamma _{11} (t -s)} {\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{12} e^{-\gamma _{12} (t -s)} {\mathrm{d}}N_{2}^n(s),\\ \lambda _{2}^n(t) = \lambda _{20}^n + \int _0^t \alpha _{21} e^{-\gamma _{21} (t -s)} {\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{22} e^{-\gamma _{22} (t -s)} {\mathrm{d}}N_{2}^n(s). \end{aligned}$$

Since the ML estimates can be numerically sometimes unstable without any restrictions on the parameter space, we have set restrictions that the discounted parameters \(\gamma _{ij}\;(i,j=1,2)\) have the same value \(\gamma \) in the following estimation. The estimation results for Case 1 are presented in Tables 3, 4.

Table 3 Tokyo–New York
Table 4 Tokyo–London

In Table 3, \(N_{1}\) of Model 1 corresponds to Tokyo and \(N_{2}\) corresponds to New York in Tokyo–New York markets. In terms of Tokyo–London, in Table 4 \(N_{1}\) of Model 1 corresponds to Tokyo while \(N_{2}\) corresponds to London.

The most important finding here (and in Tables 5, 6 below), is that the coefficient \(\alpha _{12}\) is statistically significant while the coefficient \(\alpha _{21}\) is not statistically significant. This represents a kind of non-causality test, but we will discuss this more formally below. We found reasonable values for other parameters in their magnitudes and significance, and they are significant for both Tokyo–New York and Tokyo–London markets.


Model 2


We estimated the intensity function as

$$\begin{aligned} \lambda _{1}^n(t) = \lambda _{10}^n + \int _0^t \alpha _{11} e^{-\gamma (t -s)} X_{1} {\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{12} e^{-\gamma (t -s)} X_{2}{\mathrm{d}}N_{2}^n(s),\\ \lambda _{2}^n(t) = \lambda _{20}^n + \int _0^t \alpha _{21} e^{-\gamma (t -s)} X_{1}{\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{22} e^{-\gamma (t -s)} X_{2}{\mathrm{d}}N_{2}^n(s), \end{aligned}$$

and the estimation results are presented in Tables 5, 6.

In the present case, we have similar values for the estimated coefficients as Case 1 except \(\alpha _{21}\). The significance of coefficient is more pronounced here compared to Case 1, which corresponds to the likelihood values and their AIC.

Table 5 Tokyo–New York
Table 6 Tokyo–London

Model 3


We estimated the intensity function as 0

$$\begin{aligned} \lambda _{1}^n(t) = \lambda _{10}^n + \int _0^t \alpha _{11} e^{-\gamma (t -s)} {X_{1}}^{c_{11}} {\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{12} e^{-\gamma (t -s)} {X_{2}}^{c_{12}} {\mathrm{d}}N_{2}^n(s),\\ \lambda _{2}^n(t) = \lambda _{20}^n + \int _0^t \alpha _{21} e^{-\gamma (t -s)} {X_{1}}^{c_{21}}{\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{22} e^{-\gamma (t -s)} {X_{2}}^{c_{22}}{\mathrm{d}}N_{2}^n(s). \end{aligned}$$

Although we used the ML estimation in this case, the estimates of ML are often unstable numerically. In particular, we often found numerical difficulty to calculate the standard errors of estimates (Tables 7, 8). It was probably because the optimization computations with R often were unstable without any restrictions on the parameter space due to the near-singularity of the estimated Fisher information. Then we have tried to set some restrictions that the discounted parameters \(\gamma _{j}\;(j=1,2)\) have the same value \(\gamma \) and we set \(c_{11}=c_{12}, c_{21}=c_{22}\) for instance. The results of estimation with this restriction have been given in Kunitomo et al. (2017) with the datasets of day-start to day-end. Here we report the estimation results of Model 3 with further restriction as \(c=c_{11}=c_{12}=c_{21}=c_{22}\).

Table 7 Tokyo–New York
Table 8 Tokyo–London

Overall, the results suggest that Models 2 and 3 are better than Model 1. In addition, according to AIC, Model 2 is better than Model 3 mainly because the latter is over-parametrized for Tokyo–New York markets. Hence, we adopted Model 2 in the following non-causality tests.

6.2 Non-causality tests

In applying the GNC test procedure, we set the impact function as \(c(x)=x \). We report our empirical results for the hypothesis \(H_0: \alpha _{ij}=0\) using the likelihood ratio test (LRT) statistics based on the Tokyo–New York data. For the null-hypothesis \(H_{0}:\alpha _{21}=0 ,\) LRT statistic is \(2 \times (-3562.015 +3562.017) \sim 0\), and we could not reject the null-hypothesis. (The upper 95\(\%\) critical point of \(\chi ^2(1)\) is 3.481 in Table 5.) This means that changes of the Japanese financial market have little impact on the U.S. financial market.

For testing the null-hypothesis \(H_{0}:\alpha _{12}=0 ,\) LRT statistic based on the Tokyo–New York data was \(2 \times (-3562.017 +3572.843) = 21.652 ,\) and the null-hypothesis was rejected. Thus, there is a significant effect from U.S. financial markets to Tokyo financial market (see Table 5).

Similarly, in Tokyo–London markets, for the null-hypothesis \(H_{0}:\alpha _{21}=0 ,\) LRT statistic was \(2 \times (-3660.215 +3660.215) \sim 0.0;\) that the null-hypothesis was not rejected. Thus, knock-on effects from Tokyo to London financial market are rather limited.

For the null-hypothesis \(H_{0}:\alpha _{12}=0 ,\) LRT statistic based on the Tokyo–London data was \(2 \times ( -3660.215+3665.593) \sim 10.756\), and the null-hypothesis was rejected. This means that the London market affects Tokyo market (see Tables 4 and 6).

To summarize our findings among three major financial markets, the effects of the Japanese market on the U.S. and London are rather limited, while we found significant effects of both of these markets on the Tokyo market. This finding agrees with several empirical findings obtained using different statistical methods as explained by Kunitomo et al. (2017).

6.3 Example 2: Tokyo–Hong Kong markets

For the second example, we have used daily data of day-start to day-minimum from the Nikkei-225 and the Hansen Index of Hong-Kong during January 2, 1990–August 26, 2015, which is the same sample period as Example 1. Since the trading periods in these two financial markets are quite similar, we expected simultaneous movements in the two markets. For the estimated models with two dimensions (\(d=2, p=3\)), we take the impact functions c(x) as Model 1\(\; c(x)=1\) and Model 2\(\; c(x)=x\). Because there can be many additional parameters in Model 3, which has the general form of impact functions, the estimated results are often not statistically significant and we omitted reporting our results thereof.

We first maximize the likelihood \(L_{2T}^{*}\) to estimate the marginal distributions of financial market returns. As we have shown before, we confirmed that the marginal distributions of market returns have thicker tails than the normal distribution in Table 9. Hence, it may be appropriate to use the GPD in our estimation. It is because the estimates of \({\xi }_i \;(i=1,2)\) are positive and it means that the Frechet-type tail distribution is appropriate

Table 9 Tail distributions

The estimated model consists of two dimensions (\(d=2\) and \(p=3\)), and we take the impact functions c(x) as \(Case\; (1)\; c(x)=1\) and \(Case\; (2)\; c(x)=x\). The estimated values of the log-likelihood and AIC are those with the marginal distributions \(L_{1T}^{*}\). The full likelihood can be calculated using \(L_{1T}^{*}\) and \(L_{2T}^{*}\). The SD of the estimated coefficients are evaluated numerically using the inverse of the estimated Fisher information matrix.


Model 1


We estimated the intensity function as

$$\begin{aligned} \lambda _{1}^n(t)= & {} \lambda _{10}^n + \int _0^t \alpha _{11} e^{-\gamma (t -s)} {\mathrm{d}}N_{1}^n(s) +\int _0^t \alpha _{12} e^{-\gamma (t -s)} {\mathrm{d}}N_{2}^n(s)\\&+\int _0^t \alpha _{13} e^{-\gamma (t -s)} {\mathrm{d}}N_{1,2}^n(s),\\ \lambda _{2}^n(t)= & {} \lambda _{20}^n + \int _0^t \alpha _{21} e^{-\gamma (t -s)} {\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{22} e^{-\gamma (t -s)} {\mathrm{d}}N_{2}^n(s)\\&+\int _0^t \alpha _{23} e^{-\gamma (t -s)} {\mathrm{d}}N_{1,2}^n(s),\\ \lambda _{12}^n(t)= & {} \lambda _{12,0}^n + \int _0^t \alpha _{31} e^{-\gamma (t -s)} {\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{32} e^{-\gamma (t -s)} {\mathrm{d}}N_{2}^n(s)\\&+\int _0^t \alpha _{33} e^{-\gamma (t -s)} {\mathrm{d}}N_{12}^n(s). \end{aligned}$$

Again the ML estimates can sometimes be numerically unstable, we set restrictions so that the discounted parameters \(\gamma _{ij}\;(i,j=1,2,3)\) have the same value \(\gamma \). We show the estimation results in Table 10.

Table 10 Tokyo–Hong Kong

Note that in the above table \(N_{1}\) of Model 1 corresponds to Tokyo and \(N_{2}\) corresponds to Hong Kong in Tokyo–Hong Kong markets.


Model 2


We estimated the intensity function as

$$\begin{aligned} \lambda _{1}^n(t)= & {} \lambda _{10}^n + \int _0^t \alpha _{11} e^{-\gamma (t -s)} X_{1} {\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{12} e^{-\gamma (t -s)} X_{2}{\mathrm{d}}N_{2}^n(s)\\&+\int _0^t \alpha _{13} e^{-\gamma (t -s)} \left[ \max _i X_{i} \right] {\mathrm{d}}N_{1,2}^n(s),\\ \lambda _{2}^n(t)= & {} \lambda _{20}^n + \int _0^t \alpha _{21} e^{-\gamma (t -s)} X_{1}{\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{22} e^{-\gamma (t -s)} X_{2}{\mathrm{d}}N_{2}^n(s)\\&+\int _0^t \alpha _{23} e^{-\gamma (t -s)} \left[ \max _i X_{i} \right] {\mathrm{d}}N_{12}^n(s),\\ \lambda _{12}^n(t)= & {} \lambda _{12,0}^n + \int _0^t \alpha _{31} e^{-\gamma (t -s)} X_{1}{\mathrm{d}}N_{1}^n(s) + \int _0^t \alpha _{32} e^{-\gamma (t -s)} X_{2}{\mathrm{d}}N_{2}^n(s)\\&+\int _0^t \alpha _{33} e^{-\gamma (t -s)} \left[ \max _i X_{i} \right] {\mathrm{d}}N_{1,2}^n(s). \end{aligned}$$

We present our estimation results in Table 11. From our estimated results, we find that Model 2 is better than Model 1 as in Example 1. When comparing Tables 10 and 11, we see several interesting findings. The value of AIC in Model 2 is better than Model 1 as we observed in the Tokyo–New York and Tokyo–London datasets. The estimates of coefficients of past effects are often statistically insignificant in the estimated intensity functions (\(\alpha _{12}\) and \(\alpha _{21}\)), while the contemporaneous effects of the co-jump term are statistically significant (\(\alpha _{13}\) and \(\alpha _{23}\)). This aspect basically agrees with our motivations for developing the SHPP models.

Table 11 Tokyo–Hong Kong

6.4 Non-causality tests

In applying the Granger non-causality test procedure, we set the impact function as \(c(x)=x \). We report our empirical results for the hypothesis \(H_0 : \alpha _{ij}=0\) using LRT statistics.

For the null-hypothesis \(H_{0}:\alpha _{13}=0 ,\) LRT statistic based on Tokyo–Hong Kong data was 11.14 and we reject the null-hypothesis. (The upper 95% critical value of \(\chi ^2(1)\) is 3.481). Thus, we revealed a significant instantaneous causal relationship between the Japanese financial market and Hong-Kong financial markets.

For testing the null-hypothesis \(H_{0}:\alpha _{12}=0 ,\) LRT statistics was 0.0, and the null hypothesis was accepted. In addition, for testing the null-hypothesis \(H_{0}:\alpha _{12}=0 ,\alpha _{13}=0\), LRT statistic was 11.14; thus the null-hypothesis was rejected.

For the null-hypothesis \(H_{0}:\alpha _{21}=0 ,\) LRT statistic was 0.006 and we cannot reject the null-hypothesis. (The upper 95\(\%\) critical value of \(\chi ^2(1)\) is 3.481). For testing the null-hypothesis \(H_{0}:\alpha _{23}=0 ,\) LRT statistic was 2.42,  and the null-hypothesis was accepted. Similarly, for the null-hypothesis \(H_{0}:\alpha _{21}=0 ,\alpha _{23}=0\) LRT statistic was 2.66,  and the null-hypothesis could not be rejected.

To summarize our findings in this subsection among Tokyo and Hong Kong financial markets, we found that the simultaneous effects of the two markets are significant, while the effects of past events are rather small.

6.5 Further empirical analysis

Here we employ spectral decomposition and RPC as explained in Sect. 3; see Figs. 5, 6 and 7 for the United States, United Kingdom, and Hong Kong, respectively. In the former (two) decompositions, we assume there are no co-jumps, while in the last one co-jumps are permitted. We adopted Models with \(c_{ij}(x)=x\) because the resulting models minimize AIC. The graphs are depicted at each frequency, but truncated in x axis because it seems that our empirical data of point processes do not have much information in high frequencies.

In particular, Fig. 5 gives the spectral decomposition based on the estimated intensity model (Model 2) from Japan (Nikkei-225) data, which gives the relative contributions from Japan itself and from US (\( S \& P\) 500). Figure 6 gives the spectral decomposition based on the estimated intensity model (Model 2) from Japan (Nikkei-225) data, which gives the relative contributions from Japan itself and from UK (FTSE). Then Fig. 7 gives the spectral decomposition based on the estimated intensity model with co-jumps (Model 2) from Japan (Nikkei-225) data, which gives the relative contributions from Japan itself, Hong Kong (Hansen) and the instantaneous relation.

From these figures, we found that for the relationship between the Tokyo–New York financial markets, self contribution of past events plays a major role, while there is some contribution from New York to Tokyo in the low frequency, which corresponds to the long-run relationship. On the other hand, for the relationship between the Tokyo–Hong Kong financial markets, the instantaneous contribution and the self contribution play major roles in all frequencies. This aspect reflects the fact that we used SHPP models.

Fig. 5
figure 5

Relative power contributions of Nikkei-225 (Tokyo–NY model)

Fig. 6
figure 6

Relative power contributions Nikkei-225 (Tokyo–London model)

Fig. 7
figure 7

Relative power contributions of Nikkei-225 (Tokyo–HK model)

7 Conclusions

In this paper, we developed a new method of econometric analysis of multivariate time series of events and proposed the simultaneous multivariate Hawkes-type point process (SHPP) modeling. Unlike some existing studies, we developed and used new statistical models for simultaneous sudden, large events and delayed events occurring explicitly. Using the SHPP models, we investigated Granger causality and instantaneous Granger causality on several financial markets and economies, and developed bespoke non-causality tests.

By applying GNC and IGNC tests, we revealed the important relationships among major financial markets and several empirical findings. In the Tokyo–New York financial markets, there is a strong unidirectional causation, while in the Tokyo–Hong Kong financial markets the simultaneous effects are dominant.

Several questions remain to be answered. First, although we used Hawkes-type marked point processes, there can be many possible non-linear point processes and Kurisu (2018) discussed one way to justify the use of SHPP models. In economic and financial econometrics, it is standard to handle discrete time series observations in yearly, monthly, weekly, daily, hourly, and per minute terms. Thus, we need a coherent way of investigating abrupt or sudden events, and we propose one way to deal with discrete time series events in this paper. In this respect, it should be interesting to investigate the robustness of our empirical results further. Second, the choice of threshold parameter is an important issue that is related to the relevance of the GPD in SEVT. Since we used a simple threshold parameter, we need a convincing justification on the choice of threshold. Finally, when \(d >2\) there can be many parameters to be estimated, and the estimated parameters would often be statistically insignificant. This aspect is important when we have the possibility of co-jumps and we used the likelihood and its AIC as a criterion. We need to investigate the problem of choosing statistical point process models further.

These issues are currently under investigation, we shall report our progress in these respects on another occasion.