1 Introduction

The literature on structural vector autoregressions (Svar) is vast. Popular identification schemes include short- and long-run homogenous restrictions [see, e.g. Sims (1980), Blanchard and Quah (1989)], sign restrictions [see, e.g. Faust (1998), Uhlig (2005)], time-varying heteroskedasticity (Sentana and Fiorentini 2001) or external instruments [see, e.g. Mertens and Ravn (2012), Stock and Watson (2018) or Dolado et al. (2020)]. Recently, identification through independent non-Gaussian shocks has become increasingly popular after Lanne et al. (2017) and Gouriéroux et al. (2017). The signal processing literature on Independent Component Analysis (Ica) popularised by Comon (1994) shares the same identification scheme. Specifically, if in a static model the \(N\times 1\) observed random vector \( \varvec{y}\)—the so-called signals or sensors—is the result of an affine combination of N unobserved shocks \(\varvec{\varepsilon }^{*}\)—the so-called components or sources—whose mean and variance we can set to \( \varvec{0}\) and \(\varvec{I}_{N}\) without loss of generality, namely

$$\begin{aligned} \varvec{y}=\varvec{\mu +C\varepsilon }^{\varvec{*}}, \end{aligned}$$

then the matrix \(\varvec{C}\) of loadings of the observed variables on the latent ones can be identified (up to column permutations and sign changes) from an i.i.d. sample of observations on \(\varvec{y}\) provided the following assumption holds:Footnote 1

Assumption 1

: Identification

  1. (1)

    the N shocks in (1) are cross-sectionally independent,

  2. (2)

    at least \(N-1\) of them follow a non-Gaussian distribution, and

  3. (3)

    \(\varvec{C}\) is invertible.

Failure of any of the three conditions in Assumption 1 results in an underidentified model. The best known counterexample is a multivariate Gaussian model for \(\varvec{\varepsilon } ^{*}\), in which we can identify \(V(\varvec{y)=CC}^{\prime }\) but not \( \varvec{C}\) without additional structural restrictions despite the fact that the elements of \(\varvec{\varepsilon }^{*}\) are cross-sectionally independent. Intuitively, the problem is that any rotation of the structural shocks \(\varvec{\varepsilon }^{**}=\varvec{Q\varepsilon }^{*}\), where \(\varvec{Q}\) is an orthogonal matrix, generates another set of N observationally equivalent, cross-sectionally independent shocks with standard normal marginal distributions. A less well-known counterexample would be a non-Gaussian spherical distribution for \(\varvec{\varepsilon } ^{*}\), such as the standardised multivariate Student t. In this case, the lack of identifiability of \(\varvec{C}\) is due to the fact that \(\varvec{ \varepsilon }^{*}\) and \(\varvec{\varepsilon }^{**}\) share not only their mean vector (\(\varvec{0}\)) and covariance matrix (\(\varvec{I}\)), but also the same nonlinear dependence structure.

The purpose of our paper is to propose simple to implement and interpret specification tests that check the normality of a single element of \(\varvec{ \varepsilon }^{*}\) and the potential cross-sectional dependence among several of them. In very simple terms, our tests compare the integer (product) moments of the shocks in the sample with their population counterparts. Specifically, in the Gaussian tests we compare the marginal third and fourth moments of a single shock to 0 and 3, respectively. In turn, in the case of two or more shocks, we assess the statistical significance of their second, third and fourth cross-moments, which should be equal to the product of the corresponding marginal moments under independence. Many of these moments tests can be formally justified as Lagrange multiplier tests against specific parametric alternatives [see, e.g. Mencía and Sentana (2012)], but in this paper we do not pursue this interpretation. Like Almuzara et al. (2019), though, we focus on the latent shocks rather than the observed variables in view of the fact that identifying Assumption 1 is written in terms of \(\varvec{\varepsilon }^{*}\) rather than \(\mathbf{y}\).

If we knew the true values of \(\varvec{\mu }\) and \(\varvec{C}\), \(\varvec{\mu }_{0}\) and \(\varvec{C}_{0}\) say, with \(rank(\varvec{C}_{0})=N\), our tests would be straightforward, as we could trivially recover the latent shocks from the observed signals without error. In practice, though, both \(\varvec{ \mu }\) and \(\varvec{C}\) are unknown, so we need to estimate them before computing our tests.

Although many estimation procedures for those parameters have been proposed in the literature [see, e.g. Moneta and Pallante (2020) and the references therein], in this paper we consider the discrete mixtures of normals-based pseudo-maximum likelihood estimators (PMLEs) in Fiorentini and Sentana (2020) for three main reasons. First, they are consistent for the model parameters under standard regularity conditions provided that Assumption 1 holds regardless of the true marginal distributions of the shocks. Second, they seem to be rather efficient, the rationale being that finite normal mixtures can provide good approximations to many univariate distributions. And third, the influence functions on which they are based are the scores of the pseudo-log-likelihood, which we can easily compute in closed form. As we shall see, these influence functions play a very important role in adjusting the asymptotic variances of the different tests we propose so that they reflect the sampling variability resulting from computing the shocks with consistent but noisy parameter estimators.

In this respect, we derive computationally simple closed-form expressions for the asymptotic covariance matrices of the sample moments underlying our tests under the relevant null adjusted for parameter uncertainty. Importantly, we do so not only for static Ica model (1) but also for a Svar, which is far more relevant in economics.

In many empirical finance applications of Svars, the number of observations is sufficiently large for asymptotic approximations to be reliable. In contrast, the limiting distributions of our tests may be a poor guide for the smaller samples typically used in macroeconomic applications. For that reason, we thoroughly study the finite sample size of our tests in several Monte Carlo exercises. We also discuss some bootstrap procedures that seem to improve their reliability. Finally, we show that our tests have non-negligible power against a variety of empirically plausible alternatives in which the cross-sectional independence of the shocks no longer holds.

The rest of the paper is organised as follows. Section 2 discusses the model and the estimation procedure. Then, we present our general moment tests in Sect. 3 and particularise them to assess normality and independence in Sect. 4. Next, Sect. 5 contains the results of our Monte Carlo experiments. We present our conclusions and suggestions for further research in Sect. 6 and relegate some technical material and additional simulations to several appendices.

2 Structural vector autoregressions

2.1 Model specification

Consider the following N-variate Svar process of order p:

$$\begin{aligned} \mathbf{y}_{t}=\varvec{\tau }+\mathop {\textstyle \sum }\nolimits _{j=1}^{p}\varvec{A}_{j}\varvec{ y}_{t-j}+\varvec{C}\varvec{\varepsilon }_{t}^{*},\text {\quad }\varvec{ \varepsilon }_{t}^{*}|I_{t-1}\sim i.i.d.\text { }(\varvec{0},\varvec{I}_{N}), \end{aligned}$$

where \(I_{t-1}\) is the information set, \(\varvec{C}\) the matrix of impact multipliers and \(\varvec{\varepsilon }_{t}^{*}\) the “structural” shocks, which are normalised to have zero means, unit variances and zero covariances.

Let \(\varvec{\varepsilon }_{t}=\varvec{C}\varvec{\varepsilon }_{t}^{*}\) denote the reduced form innovations, so that \(\varvec{\varepsilon } _{t}|I_{t-1}\sim i.i.d.\) \((\varvec{0},\varvec{\Sigma })\) with \(\varvec{ \Sigma }=\varvec{CC}^{\prime }\). As we mentioned in introduction, a Gaussian (pseudo) log-likelihood is only able to identify \(\varvec{\Sigma }\), which means the structural shocks \(\varvec{\varepsilon }_{t}^{*}\) and their loadings in \(\varvec{C}\) are only identified up to an orthogonal transformation. Specifically, we can use the so-called LQ matrix decompositionFootnote 2 to relate the matrix \(\varvec{C}\) to the Cholesky decomposition of \(\varvec{\Sigma }=\varvec{\Sigma }_{L}\varvec{\Sigma }_{L}^{\prime }\) as

$$\begin{aligned} \varvec{C}=\varvec{\Sigma }_{L}{\varvec{Q}}\mathrm{,} \end{aligned}$$

where \(\varvec{Q}\) is an \(N\times N\) orthogonal matrix, which we can model as a function of \(N(N-1)/2\) parameters \(\varvec{\omega }\) by assuming that \(| \varvec{Q}|=1\).Footnote 3 Notice that if \(|\varvec{Q} |\!=\!-1\) instead, we can change the sign of the \(i^{th}\) structural shock and its impact multipliers in the \(i^{th}\) column of the matrix \(\varvec{C}\) without loss of generality as long as we also modify the shape parameters of the distribution of \(\varepsilon _{it}^{*}\) to alter the sign of all its nonzero odd moments.

In this context, Lanne et al. (2017) show that statistical identification of both the structural shocks and \(\varvec{C}\) (up to column permutations and sign changes) is possible under Ica identification Assumption 1, which we maintain in what follows. Popular examples of univariate non-normal distributions are the Student t and the generalised error (or Gaussian) distribution, which includes normal, Laplace and uniform as special cases, as well as symmetric and asymmetric finite normal mixtures.

2.2 Pseudo-maximum likelihood estimators

2.2.1 The criterion function

Let \(\varvec{\theta \!}=\varvec{\!}[\varvec{\tau }^{\prime },vec^{\prime }(\varvec{A}_{1}),\ldots ,vec^{\prime }(\varvec{A} _{p}),vec^{\prime }(\varvec{C})\varvec{]}^{\prime }\varvec{\!}=\varvec{\!}( \varvec{\tau }^{\prime },\varvec{a}_{1}^{\prime },\ldots ,\varvec{a} _{p}^{\prime },\varvec{c}^{\prime })\varvec{\!}=\varvec{\!}(\varvec{\tau } ^{\prime },\varvec{a}^{\prime },\varvec{c}^{\prime })\), denote the structural parameters characterising the first two conditional moments of \(\varvec{y} _{t}\). In addition, we assume \(\varepsilon _{it}^{*}|I_{t-1}\sim i.i.d.\) \(D(0,1,\varvec{\varrho }_{i})\), where \(\varvec{\varrho }_{i}\) is a \( q_{i}\times 1\) vector of variation-free shape parameters, so that in principle different shocks could follow different distributions. For simplicity of notation, though, we maintain that the univariate distributions of the shocks belong to the same family. We can then collect all the shape parameters in the \(q\times 1\) vector \(\varvec{\varrho }=( \varvec{\varrho }_{1}^{\prime },\ldots ,\varvec{\varrho }_{N}^{\prime })^{\prime }\), with \(q=\mathop {\textstyle \sum }\nolimits _{i=1}^{N}q_{i}\), so that \(\varvec{\phi }=(\varvec{\theta }^{\prime },\varvec{\varrho }^{\prime })^{\prime }\) is the \([N+(p+1)N^{2}+q]\times 1\) vector containing all the model parameters.

Given the linear mapping between structural shocks and reduced form innovations, the contribution to the conditional log-likelihood function from observation \(\varvec{y}_{t}\) \((t=1,\ldots ,T)\) for those parameter configurations for which \(\varvec{C}\) has full rank will be given by

$$\begin{aligned} l(\varvec{y}_{t};\varvec{\phi })= & {} -\ln |\varvec{C}|+\ln f[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta });\varvec{\varrho }]=-\ln |\varvec{C}|+\ln f[\varepsilon _{1t}^{*}(\varvec{\theta });\varvec{\varrho }_{1}]+\cdots \nonumber \\&+\ln f[\varepsilon _{Nt}^{*}(\varvec{\theta });\varvec{\varrho } _{N}]=l_{t}(\varvec{\phi }), \end{aligned}$$

where \(f[\varepsilon _{it}^{*}(\varvec{\theta });\varvec{\varrho }_{i}]\) is the univariate log-likelihood function for the \(i^{th}\) structural shock, \(\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })=\varvec{C}^{-1}\varvec{ \varepsilon }_{t}(\varvec{\theta })\), and \(\varvec{\varepsilon }_{t}(\varvec{ \theta })=\varvec{y}_{t}-\varvec{\tau -A}_{1}\varvec{y}_{t-1}-\cdots - \varvec{A}_{p}\varvec{y}_{t-p}\) are the reduced-form innovations.

2.2.2 The score vector

Let \(\varvec{s}_{t}(\varvec{\phi })\) denote the score function \( \partial l_{t}(\varvec{\phi })\varvec{/\partial \phi }\) and partition it into two blocks, \(\varvec{s}_{\varvec{\theta }t}(\varvec{\phi })\) and \( \varvec{s}_{\varvec{\varrho }t}(\varvec{\phi })\), whose dimensions conform to those of \(\varvec{\theta }\) and \(\varvec{\varrho }\), respectively. Fiorentini and Sentana (2021) show that the scores can be written as

$$\begin{aligned} \varvec{s}_{\varvec{\theta }t}(\varvec{\phi })= & {} [\varvec{Z}_{lt}(\varvec{ \theta }),\varvec{Z}_{st}(\varvec{\theta })]\left[ \begin{array}{c} \varvec{e}_{lt}(\varvec{\phi }) \\ \varvec{e}_{st}(\varvec{\phi }) \end{array} \right] =\varvec{Z}_{dt}(\varvec{\theta })\varvec{e}_{dt}(\varvec{\phi }), \end{aligned}$$
$$\begin{aligned} \varvec{s}_{\varvec{\varrho }t}(\varvec{\phi })= & {} \varvec{e}_{rt}(\varvec{ \phi }), \end{aligned}$$


$$\begin{aligned} \varvec{Z}_{lt}(\varvec{\theta })= & {} \left( \begin{array}{c} \varvec{I}_{N} \\ \varvec{y}_{t-1}\otimes \varvec{I}_{N} \\ \vdots \\ \varvec{y}_{t-p}\otimes \varvec{I}_{N} \\ \varvec{0}_{N^{2}\times N} \end{array} \right) \varvec{C}^{-1\prime }, \end{aligned}$$
$$\begin{aligned} \varvec{Z}_{st}(\varvec{\theta })= & {} \left( \begin{array}{c} \varvec{0}_{N\times N^{2}} \\ \varvec{0}_{N^{2}\times N^{2}} \\ \vdots \\ \varvec{0}_{N^{2}\times N^{2}} \\ \varvec{I}_{N^{2}} \end{array} \right) (\varvec{I}_{N}\otimes \varvec{C}^{-1\prime }), \end{aligned}$$
$$\begin{aligned} \varvec{e}_{lt}(\varvec{\phi })= & {} -\frac{\partial \ln f[\varvec{\varepsilon } _{t}^{*}(\varvec{\theta });\varvec{\varrho }]}{\partial \varvec{ \varepsilon }^{*}}=-\left\{ \begin{array}{c} \frac{\partial \ln f[\varepsilon _{1t}^{*}(\varvec{\theta });\varvec{\varrho }_{1}\varvec{]}}{\partial \varepsilon _{1}^{*}} \\ \vdots \\ \frac{\partial \ln f[\varepsilon _{Nt}^{*}(\varvec{\theta });\varvec{\varrho }_{N}\varvec{]}}{\partial \varepsilon _{N}^{*}} \end{array} \right\} , \end{aligned}$$
$$\begin{aligned} \varvec{e}_{st}(\varvec{\phi })= & {} -vec\left\{ \varvec{I}_{N}+\frac{\partial \ln f[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta });\varvec{\varrho }] }{\partial \varvec{\varepsilon }^{*}}\cdot \varvec{\varepsilon } _{t}^{*\prime }(\varvec{\theta })\right\} \nonumber \\= & {} -vec\left\{ \begin{array}{ccc} 1+\frac{\partial \ln f[\varepsilon _{1t}^{*}(\varvec{\theta });\varvec{ \varrho }_{1}\varvec{]}}{\partial \varepsilon _{1}^{*}}\cdot \varepsilon _{1t}^{*}(\varvec{\theta }) &{} \ldots &{} \frac{\partial \ln f[\varepsilon _{1t}^{*}(\varvec{\theta });\varvec{\varrho }_{1}\varvec{]}}{\partial \varepsilon _{1}^{*}}\cdot \varepsilon _{Nt}^{*}(\varvec{\theta }) \\ \vdots &{} \ddots &{} \vdots \\ \frac{\partial \ln f[\varepsilon _{Nt}^{*}(\varvec{\theta });\varvec{ \varrho }_{N}\varvec{]}}{\partial \varepsilon _{N}^{*}}\cdot \varepsilon _{1t}^{*}(\varvec{\theta }) &{} \ldots &{} 1+\frac{\partial \ln f[\varepsilon _{Nt}^{*}(\varvec{\theta });\varvec{\varrho }_{N}\varvec{]} }{\partial \varepsilon _{N}^{*}}\cdot \varepsilon _{Nt}^{*}(\varvec{\theta }) \end{array} \right\} \nonumber \\ \end{aligned}$$


$$\begin{aligned} \varvec{e}_{rt}(\varvec{\phi })=\frac{\partial \ln f[\varvec{\varepsilon } _{t}^{*}(\varvec{\theta });\varvec{\varrho }]}{\partial \varvec{\varrho } }=\left\{ \begin{array}{c} \frac{\partial \ln f[\varepsilon _{1t}^{*}(\varvec{\theta });\varvec{ \varrho }_{1}\varvec{]}}{\partial \varvec{\varrho }_{1}} \\ \vdots \\ \frac{\partial \ln f[\varepsilon _{Nt}^{*}(\varvec{\theta });\varvec{ \varrho }_{N}\varvec{]}}{\partial \varvec{\varrho }_{N}} \end{array} \right\} =\left\{ \begin{array}{c} \varvec{e}_{r_{1}t}(\varvec{\phi }) \\ \varvec{e}_{r_{2}t}(\varvec{\phi }) \\ \vdots \\ \varvec{e}_{r_{N}t}(\varvec{\phi }) \end{array} \right\} \end{aligned}$$

by virtue of the cross-sectional independence of the shocks, so that the derivatives involved correspond to the underlying univariate densities.

2.2.3 The asymptotic distribution

For simplicity, we assume henceforth that Svar model (2) generates a covariance stationary process.Footnote 4 Consider the reparametrisation \(\varvec{C}=\varvec{J\Psi }\), where \(\varvec{\Psi }\) is a diagonal matrix whose elements contain the scale of the structural shocks, while the columns of \(\varvec{J}\), whose diagonal elements are normalised to 1, measure the relative impact of each of the structural shocks on all the remaining variables. Proposition 3 in Fiorentini and Sentana (2020) shows that the parameters \(\varvec{a}_{i}=vec(\varvec{A}_{i}{)}\) and \( \varvec{j}=veco(\varvec{J})\) are consistently estimated regardless of the true distribution.Footnote 5 As a result, the pseudo-true values of those parameters will coincide with the true ones, i.e. \(\varvec{a}_{i\infty }= \varvec{a}_{i0}\) and \(\varvec{j}_{\infty }=\varvec{j}_{0}\). In contrast, \( \varvec{\tau }\) and \(\varvec{\psi }=vecd(\varvec{\Psi })\) will generally be inconsistently estimated, so \(\varvec{\tau }_{\infty }\ne \varvec{\tau } _{0} \) and \(\varvec{\psi }_{\infty }\ne \varvec{\psi }_{0}\).

Nevertheless, Fiorentini and Sentana (2020) prove that the unrestricted PMLEs of \(\varvec{\tau }\) and \(\varvec{\psi }\) which simultaneously estimate \(\varvec{\varrho }\) will be consistent too when the univariate distributions used for estimation purposes are discrete mixtures of normals, in which case \(\varvec{\theta }_{\infty }=\varvec{\theta }_{0}\) and \(\varvec{\varepsilon } _{t}^{*}(\varvec{\theta }_{0})=\varvec{\varepsilon }_{t}^{*}\). For that reason, in what follows we focus on the finite normal mixtures-based PMLEs of the original parameters \(\varvec{\theta }=(\varvec{\tau }^{\prime }, \varvec{a}^{\prime },\varvec{c}^{\prime })^{\prime }\).

Still, the potential misspecification of this distributional assumption implies that the asymptotic covariance matrix of the corresponding PMLEs must be based on the usual sandwich formula. Let

$$\begin{aligned} \mathcal {A}(\varvec{\phi }_{\infty };\varvec{\varphi }_{0})=-E[\partial \varvec{s}_{\varvec{\phi }t}(\varvec{\phi }_{\infty })/\partial \varvec{\phi }^{\prime })|\varvec{\varphi }_{0}] \end{aligned}$$


$$\begin{aligned} \mathcal {B}(\varvec{\phi }_{\infty };\varvec{\varphi }_{0})=V[\varvec{s}_{ \varvec{\phi }t}(\varvec{\phi }_{\infty })|\varvec{\varphi }_{0}] \end{aligned}$$

denote the (−) expected value of the log-likelihood Hessian and the variance of the score, respectively, where \(\varvec{\varrho }_{\infty }\) are the pseudo-true values of the shape parameters of the distributions of the shocks assumed for estimation purposes, \(\varvec{\upsilon }\) contains the potentially infinite-dimensional shape parameters of the true distributions of the shocks, and \(\varvec{\varphi }=(\varvec{\theta }^{\prime },\varvec{\upsilon }^{\prime })^{\prime }\) . The asymptotic distribution of the pseudo-ML estimators of \(\varvec{\phi }\) , \(\varvec{{\hat{\phi }}}_{T}\), under standard regularity conditions will be given by

$$\begin{aligned} \sqrt{T}(\varvec{{\hat{\phi }}}_{T}-\varvec{\phi }_{\infty })\rightarrow N[ \varvec{0},\mathcal {A}^{-1}(\varvec{\phi }_{\infty };\varvec{\varphi }_{0}) \mathcal {B}(\varvec{\phi }_{\infty };\varvec{\varphi }_{0})\mathcal {A}^{-1}( \varvec{\phi }_{\infty };\varvec{\varphi }_{0})]. \end{aligned}$$

In what follows, we shall make extensive use of the detailed expressions for the conditional expected value of the Hessian and covariance matrix of the score for finite normal mixtures-based PMLEs in Amengual et al. (2021b).

3 Specification tests based on integer product moments

3.1 The influence functions

As we have stressed earlier, the parametric identification of the structural shocks \(\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })\) and their impact coefficients \(\varvec{C}\) that appear in the Svar model (2) critically hinges on the validity of identifying Assumption 1. As a consequence, it would be desirable that empirical researchers estimating those models reported specification tests that would check those assumptions. Given that rank failures in \(\varvec{C}\) will be inextricably linked to singular dynamic systems,Footnote 6 we focus on testing that at most one of the structural shocks is Gaussian and that all the structural shocks are indeed independent of each other.

As is well known, stochastic independence between the elements of a random vector is equivalent to the joint distribution being the product of the marginal ones. In turn, this factorisation implies lack of correlation between not only the levels but also any set of single-variable measurable transformations of those elements. Thus, a rather intuitive way of testing for independence without considering any specific parametric alternative can be based on individual moment conditions of the form

$$\begin{aligned} m_{\varvec{h}}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta )]} =\prod _{i=1}^{N}\varepsilon _{it}^{*h_{i}}(\varvec{\theta }) -\prod _{i=1}^{N}E[\varepsilon _{it}^{*h_{i}}(\varvec{\theta }_{0}{ )]}, \end{aligned}$$

where \(\varvec{h}=\{h_{1},...,h_{N}\}\), with \(h_{i}\in \mathbb {Z}_{0+}\), denotes the index vector characterising a specific product moment. While the influence function in (14) will generally require the estimation of \(E[\varepsilon _{it}^{*h_{i}}(\varvec{\theta }_{0}{) }]\) for some of the shocks, the constant term \(\mathop {\textstyle \prod }_{i=1}^{N}E[\varepsilon _{it}^{*h_{i}}(\varvec{\theta }_{0}{)}]\) is either 0 or 1 for the second, third and fourth cross-moments we study in this paper in view of the standardised nature of the shocks, so we do not need to worry about it. Amengual et al. (2021b) discuss in detail how to deal with the estimation of the required \(E[\varepsilon _{it}^{*h_{i}}(\varvec{ \theta }_{0}{)}]\) in the general case.

Although we have motivated (14) as the basis for our tests of independence, by setting all the elements of \(\varvec{h}\) but one to 0, we can also use this expression to look at the marginal moments of a single shock. In this paper, we focus on \(h_{i}=3\) and 4 because most common departures from normality of the shocks will be reflected in coefficients of skewness or kurtosis different from 0 and 3, respectively.

3.2 The moment tests

Let \(\varvec{m}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]\) denote a \(K\times 1\) vector containing a collection of influence functions \( m_{\varvec{h}^{k}}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]\) of form (14) for different index vectors \(\varvec{h} ^{1},\ldots ,\varvec{h}^{k},,\ldots ,\varvec{h}^{K}\). The following result, which specialises the general expressions in Newey (1985) and Tauchen (1985) to our context, derives the asymptotic distribution of the scaled sample average of \(\varvec{m}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]\) when we evaluate the structural shocks at the PMLEs \(\varvec{{\hat{\theta }}} _{T}\) rather than at \(\varvec{\theta }_{0}\):

Proposition 1

Under Assumption 1 and standard regularity conditions

$$\begin{aligned} \frac{\sqrt{T}}{T}\sum \nolimits _{t=1}^{T}\varvec{m}[\varvec{\varepsilon } _{t}^{*}(\varvec{{\hat{\theta }}}_{T})]\rightarrow N[0,\mathcal {W}(\varvec{ \phi }_{\infty };\varvec{\varphi }_{0})], \end{aligned}$$


$$\begin{aligned} \mathcal {W}(\varvec{\phi }_{\infty };\varvec{\varphi }_{0})= & {} \mathcal {V}( \varvec{\phi }_{\infty };\varvec{\varphi }_{0})\\&+\mathcal {J}(\varvec{\phi } _{\infty };\varvec{\varphi }_{0})\mathcal {A}^{-1}(\varvec{\phi }_{\infty }; \varvec{\varphi }_{0})\mathcal {B}(\varvec{\phi }_{\infty };\varvec{\varphi } _{0})\mathcal {A}^{-1}(\varvec{\phi }_{\infty };\varvec{\varphi }_{0})\mathcal {J}^{\prime }(\varvec{\phi }_{\infty };\varvec{\varphi }_{0}) \\&+\mathcal {F}(\varvec{\phi }_{\infty };\varvec{\varphi }_{0})\mathcal {A} ^{-1}(\varvec{\phi }_{\infty };\varvec{\upsilon }_{0})\mathcal {J}^{\prime }( \varvec{\phi }_{\infty };\varvec{\varphi }_{0})\\&+\mathcal {J}(\varvec{\phi } _{\infty };\varvec{\varphi }_{0})\mathcal {A}^{-1}(\varvec{\phi }_{\infty }; \varvec{\varphi }_{0})\mathcal {F}^{\prime }(\varvec{\phi }_{\infty };\varvec{\varphi }_{0}), \\ \mathcal {V}(\varvec{\phi };\varvec{\varphi })= & {} V\left\{ \left. \varvec{m}[ \varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]\right| \varvec{ \varphi }\right\} , \\ \mathcal {J}(\varvec{\phi };\varvec{\varphi })= & {} E\left\{ \left. \frac{\partial \varvec{m}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]}{\partial \varvec{\phi }^{\prime }}\right| \varvec{\varphi }\right\} , \\ \mathcal {F}(\varvec{\phi };\varvec{\varphi })= & {} cov\left\{ \left. \frac{ \partial \varvec{m}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]}{ \partial \varvec{\phi }^{\prime }},\varvec{s}_{\varvec{\phi }t}(\varvec{\phi })\right| \varvec{\varphi }\right\} \end{aligned}$$

and \(\mathcal {A}(\varvec{\phi }_{\infty };\varvec{\varphi }_{0})\) and \( \mathcal {B}(\varvec{\phi }_{\infty };\varvec{\varphi }_{0})\) are defined in (12) and (13), respectively.

In the next subsections, we provide detailed expressions for \(\mathcal {V}( \varvec{\phi };\varvec{\varphi })\), \(\mathcal {J}(\varvec{\phi };\varvec{ \varphi })\) and \(\mathcal {F}(\varvec{\phi };\varvec{\varphi })\) which exploit that the true shocks are cross-sectionally and serially independent under the null hypothesis of correct specification of the static Ica model (1) or the dynamic Svar model (2).

3.2.1 Covariance across influence functions

Consider a generic element of the matrix \(cov\{\varvec{m}[\varvec{ \varepsilon }_{t}^{*}(\varvec{\theta })],\varvec{m}[\varvec{ \varepsilon }_{t}^{*}(\varvec{\theta })]|\varvec{\varphi }\}\), say

$$\begin{aligned} cov\{m_{\varvec{h}}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })],m_{ \varvec{h}^{\prime }}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]| \varvec{\varphi }\}= & {} E\{m_{\varvec{h}}[\varvec{\varepsilon }_{t}^{*}( \varvec{\theta })]m_{\varvec{h}^{\prime }}[\varvec{\varepsilon }_{t}^{*}( \varvec{\theta })]|\varvec{\varphi }\}\\&-E\{m_{\varvec{h}}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]|\varvec{\varphi }\}E\{m_{\varvec{h}^{\prime }}[ \varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]|\varvec{\varphi }\}. \end{aligned}$$

If we exploit the cross-sectional independence of the shocks under the null hypothesis, which implies that at the true values

$$\begin{aligned} E\left( \prod \nolimits _{i=1}^{N}\varepsilon _{it}^{*h_{i}}\right) =\prod \nolimits _{i=1}^{N}E(\varepsilon _{it}^{*^{h_{i}}}), \end{aligned}$$

we obtain

$$\begin{aligned} cov\{m_{\varvec{h}}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta }_{0})],m_{\varvec{h}^{\prime }}[\varvec{\varepsilon }_{t}^{*}( \varvec{\theta }_{0})]|\varvec{\varphi }_{0}\}=\prod \nolimits _{i=1}^{N}E\left[ \varepsilon _{it}^{*(h_{i}+h_{i}^{\prime })} \right] -\prod \nolimits _{i=1}^{N}E(\varepsilon _{it}^{*h_{i}}) E(\varepsilon _{it}^{*h_{i}^{\prime }}). \nonumber \\ \end{aligned}$$

3.2.2 The expected Jacobian

Straightforward application of the chain rule implies that

$$\begin{aligned} \frac{\partial m_{\varvec{h}}[\varvec{\varepsilon }_{t}^{*}(\varvec{ \theta })]}{\partial \varvec{\phi }}=\frac{\partial m_{\varvec{h}}[\varvec{ \varepsilon }_{t}^{*}(\varvec{\theta })]}{\partial \varvec{\varepsilon } ^{\prime }}\frac{\partial \varvec{\varepsilon }_{t}(\varvec{\theta })}{ \partial \varvec{\phi }}. \end{aligned}$$

On this basis, the following proposition characterises the expected Jacobian matrix for any \(\varvec{h}\):

Proposition 2

Suppose that model (2) satisfies Assumption 1. Then, the expected Jacobian matrix of \( m_{\varvec{h}}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]\) evaluated at the true values is given by


As for \(\partial m_{\varvec{h}}[\varvec{\varepsilon }_{t}^{*}(\varvec{ \theta })]/\partial \varvec{\varepsilon }^{*\prime }\), if we denote all the distinct second, third and fourth moments by


where \(\varvec{D}_{N}\), \(\varvec{T}_{N}\) and \(\varvec{Q}_{N}\) are the duplication, triplication and quadruplication matrices, respectively [see Meijer (2005) for details], the results we derive in Appendix B.1 provide an easy way to compute all those derivatives recursively.

3.2.3 The covariance with the score

Let \(\varvec{\ell }_{N}\) denote a vector of N ones and I(.) the usual indicator function. The following proposition provides the last ingredient of the adjusted covariance matrix in Proposition 1.

Proposition 3

Suppose that model (2) satisfies Assumption 1. Then, the covariance between the influence function \(m_{\varvec{h}}(\cdot )\) and the pseudo-log-likelihood scores evaluated at the (pseudo) true values is given by

$$\begin{aligned} cov\{m_{\varvec{h}}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta }_{0} {)}],\varvec{s}_{\varvec{\phi }t}(\varvec{\phi }_{\infty })|\varvec{ \varphi }_{0}\}=\mathcal {F}_{\varvec{h}}(\varvec{\phi }_{\infty },\varvec{ \varphi }_{0})=E[\mathcal {F}_{\varvec{h}t}(\varvec{\phi }_{\infty },\varvec{ \varphi }_{0})], \end{aligned}$$


$$\begin{aligned} \mathcal {F}_{\varvec{h}l}(\varvec{\phi }_{\infty },\varvec{\varphi }_{0})= \left[ \begin{array}{c} \mathcal {F}_{\varvec{h}l}(\varvec{\varrho }_{\infty },\varvec{\upsilon }_{0}) \\ \mathcal {F}_{\varvec{h}s}(\varvec{\varrho }_{\infty },\varvec{\upsilon }_{0}) \\ \mathcal {F}_{\varvec{h}r}(\varvec{\varrho }_{\infty },\varvec{\upsilon }_{0}) \end{array} \right] \left[ \begin{array}{cc} \varvec{Z}_{lt}^{\prime }(\varvec{\theta }_{0}) &{} \varvec{0} \\ \varvec{Z}_{s}^{\prime }(\varvec{\theta }_{0}) &{} \varvec{0} \\ \varvec{0} &{} \varvec{I}_{q} \end{array} \right] , \end{aligned}$$

\(\mathcal {F}_{\varvec{h}l}(\varvec{\varrho }_{\infty },\varvec{\varphi } _{0}) \) is a \(1\times N\) vector whose entries are such that for any i with \(h_{i}>0\),

and zero otherwise, \(\mathcal {F}_{\varvec{h}s}(\varvec{\varrho }_{\infty }, \varvec{\varphi }_{0})\) is a \(1\times N^{2}\) vector whose entries are such that for any i with \(h_{i}>0\) and \(i^{\prime }\) with \(h_{i^{\prime }}>0\)

and zero otherwise, and finally

$$\begin{aligned} \mathcal {F}_{\varvec{h}r}(\varvec{\varrho }_{\infty },\varvec{\varphi }_{0})= {\textsc {F}}_{\varvec{h}r}^{\prime }(\varvec{\phi }_{\infty },\varvec{ \varphi }_{0})\varvec{\ell }_{N}, \end{aligned}$$

with F\(_{\varvec{h}r}(\varvec{\varrho }_{\infty },\varvec{\varphi } _{0})\) another block diagonal matrix of order \(N\times q\) with typical block of size \(1\times q_{i}\),

and zero otherwise.

4 Particular cases

4.1 Testing normality

As we have mentioned before, we can use (14) to test the null hypothesis that a single structural shock is Gaussian by comparing its third and fourth sample moments with 0 and 3, respectively, which are the population values of those moments under the null of normality. Nevertheless, many authors [see, e.g. Bontemps and Meddahi (2005) and the references therein] convincingly argue that it is generally more appropriate to look at the sample averages of the third and fourth Hermite polynomials instead. In particular, one should consider \( H_{3}(\varepsilon _{it}^{*})=\varepsilon _{it}^{*3}-3\varepsilon _{it}^{*}\) and \(H_{4}(\varepsilon _{it}^{*})=\varepsilon _{it}^{*4}-6\varepsilon _{it}^{*2}+3\) rather than \(\varepsilon _{it}^{*3}\) and \(\varepsilon _{it}^{*4}\) only. The reason is that Hermite polynomials have two main advantages. First, given that

$$\begin{aligned} \frac{\partial H_{3}(\varepsilon _{it}^{*})}{\partial \varepsilon _{i}^{*}}=3H_{2}(\varepsilon _{it}^{*})\quad \text {and}\quad \frac{ \partial H_{4}(\varepsilon _{it}^{*})}{\partial \varepsilon _{i}^{*}} =4H_{3}(\varepsilon _{it}^{*}), \end{aligned}$$

the results in Proposition 2 immediately imply that their expected Jacobians will be 0 under the null of normality, so they are immune to the sampling uncertainty resulting from using estimated shocks. Second, \(H_{3}(\varepsilon _{it}^{*})\) and \(H_{4}(\varepsilon _{it}^{*})\) are orthogonal under the Gaussian null, which means that the joint test is simply the sum of two asymptotically independent components: one for skewness and another one for kurtosis.

The properties of the estimators that we use, though, mean that the usual implementation of the Jarque and Bera (1980) test, which simply looks at the sample averages of \(\varepsilon _{it}^{*3}(\varvec{{\hat{\theta }}}_{T})\) and \(\varepsilon _{it}^{*4}(\varvec{{\hat{\theta }}}_{T})\), yields numerically the same statistics as the tests based on the Hermite polynomials despite the fact that it ignores the terms involving \( \varepsilon _{it}^{*}\) and \(\varepsilon _{it}^{*2}\). The intuition is as follows. Proposition 1 in Fiorentini and Sentana (2020) states that the PMLEs of the unconditional mean and variance of a univariate finite mixture of normals numerically coincide with the sample mean and variance (with denominator T) of the observed series. Given that log-likelihood function (4) for any given values of \(\varvec{a}\) and \(\varvec{j}\) is effectively the sum of N such univariate log-likelihoods with parameters that are variation-free, the estimated shocks will be such that

$$\begin{aligned} \frac{1}{T}\sum \nolimits _{t=1}^{T}\varepsilon _{it}^{*}(\varvec{\hat{ \theta }}_{T})=0\quad \text {and}\quad \frac{1}{T}\sum \nolimits _{t=1}^{T}\varepsilon _{it}^{*2}(\varvec{{\hat{\theta }}}_{T})-1=0\quad \forall i \end{aligned}$$

regardless of the sample size. This property also has interesting implications for the independence tests that we will consider in the next section because, in effect, each estimated shock will be standardised in the sample.

Finally, it is important to emphasise that the non-normality of a single shock does not guarantee the identification of the model parameters, in the same way as its normality does not imply they are underidentified. As we shall see in the Monte Carlo section, though, researchers can get an informative guide to the validity of Assumption 1 by looking at the normality tests for all the individual shocks.

4.2 Testing independence

At first sight, the arguments in the previous section might suggest that the sample covariances between the estimated shocks will also be 0 by construction. However, this is not generally true. The finite normal mixture PMLEs guarantee the univariate standardisation of each shock, but it does not imply their orthogonality in any given sample, unlike what would happen with a multivariate Gaussian likelihood function in which enough a priori restrictions were imposed on \(\varvec{C}\) to render the model exactly identified. Intuitively, the parameter values that maximise (4) are trying to make the estimated shocks stochastically independent, not merely orthogonal [see Herwartz (2018)].

For that reason, the first test for independence that we consider will be based on the second cross-moment condition

$$\begin{aligned} E(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*})=0,\text { } i\ne i^{\prime }. \end{aligned}$$

In other words, we are simply assessing if the sample correlation between the \(i^{th}\) and \(i^{\prime th}\) estimated shocks is significantly different from zero in the usual statistical sense.

Nevertheless, we can also go beyond linear dependence and look at moments that characterise the co-skewness across the structural shocks. These can be of two types:

$$\begin{aligned} E(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*})-E(\varepsilon _{it}^{*2})E(\varepsilon _{i^{\prime }t}^{*})=E(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*})=0,\text { }i\ne i^{\prime }, \end{aligned}$$


$$\begin{aligned} E(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*})-E(\varepsilon _{it}^{*})E(\varepsilon _{i^{\prime }t}^{*})E(\varepsilon _{i^{\prime \prime }t}^{*})=E(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*})=0,\text { }i\ne i^{\prime }\ne i^{\prime \prime }, \end{aligned}$$

depending on whether they involve two or three different shocks.

Finally, we can also look at the different co-kurtosis among the shocks, which may involve a pair of shocks, namely

$$\begin{aligned} E(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*2})-E(\varepsilon _{it}^{*2})E(\varepsilon _{i^{\prime }t}^{*2})=E(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*2})-1=0, \text { }i\ne i^{\prime }, \end{aligned}$$


$$\begin{aligned} E(\varepsilon _{it}^{*3}\varepsilon _{i^{\prime }t}^{*})-E(\varepsilon _{it}^{*3})E(\varepsilon _{i^{\prime }t}^{*})=E(\varepsilon _{it}^{*3}\varepsilon _{i^{\prime }t}^{*})=0,\text { }i\ne i^{\prime }, \end{aligned}$$

three shocks

$$\begin{aligned} E(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*})-E(\varepsilon _{it}^{*2})E(\varepsilon _{i^{\prime }t}^{*})E(\varepsilon _{i^{\prime \prime }t}^{*})=E(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*})=0,\text { }i\ne i^{\prime }\ne i^{\prime \prime }, \end{aligned}$$

and even four shocks

$$\begin{aligned} E(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*}\varepsilon _{i^{\prime \prime \prime }t}^{*})-E(\varepsilon _{it}^{*})E(\varepsilon _{i^{\prime }t}^{*})E(\varepsilon _{i^{\prime \prime }t}^{*})E(\varepsilon _{i^{\prime \prime \prime }t}^{*})= & {} E(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*}\varepsilon _{i^{\prime \prime \prime }t}^{*})=0,\nonumber \\&i\ne i^{\prime }\ne i^{\prime \prime }\ne i^{\prime \prime \prime }. \end{aligned}$$

Thus, we substantially expand the set of moments researchers can use to test for the independence of the components relative to Hyvärinen (2013), who only suggested looking at the co-kurtosis terms in (22). The above moment conditions also augment those considered by Lanne and Luoto (2021), who focus on (19), (22) and (23), together with \( E(\varepsilon _{it}^{*})=0\) and \(E(\varepsilon _{it}^{*2})=1\).

4.2.1 Covariance across influence functions

Next, we derive in detail the nonzero elements of the covariance matrix of the second, third and fourth moments in (16).

It is easy to see that under the null hypothesis of independence, the only nonzero elements of the covariance matrix of \(\varvec{m}^{cv}[\varvec{ \varepsilon }_{t}^{*}(\varvec{\theta })]\) are

$$\begin{aligned} V(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*})=1. \end{aligned}$$

In turn, in the case of \(\varvec{m}^{cs}[\varvec{\varepsilon }_{t}^{*}( \varvec{\theta })]\) and \(\varvec{m}^{ck}[\varvec{\varepsilon }_{t}^{*}( \varvec{\theta })]\), the nonzero elements are

$$\begin{aligned} V(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*})= & {} 1, \\ V(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*})= & {} E(\varepsilon _{it}^{*4}), \\ cov(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*},\varepsilon _{i^{\prime }t}^{*2}\varepsilon _{it}^{*})= & {} E(\varepsilon _{it}^{*3})E(\varepsilon _{i^{\prime }t}^{*3}), \end{aligned}$$


$$\begin{aligned} V(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*}\varepsilon _{i^{\prime \prime \prime }t}^{*})= & {} 1, \\ V(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*})= & {} E(\varepsilon _{it}^{*4}), \\ V(\varepsilon _{it}^{*3}\varepsilon _{i^{\prime }t}^{*})= & {} E(\varepsilon _{it}^{*6}), \\ V(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*2})= & {} E(\varepsilon _{it}^{*4})E(\varepsilon _{i^{\prime }t}^{*4})-1, \\ cov(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*},\varepsilon _{i^{\prime }t}^{*2}\varepsilon _{it}^{*}\varepsilon _{i^{\prime \prime }t}^{*})= & {} E(\varepsilon _{it}^{*3})E(\varepsilon _{i^{\prime }t}^{*3}), \\ cov(\varepsilon _{it}^{*3}\varepsilon _{i^{\prime }t}^{*},\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*2})= & {} E(\varepsilon _{it}^{*5})E(\varepsilon _{i^{\prime }t}^{*3}), \\ cov(\varepsilon _{it}^{*3}\varepsilon _{i^{\prime }t}^{*},\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*2})= & {} E(\varepsilon _{it}^{*5})E(\varepsilon _{i^{\prime }t}^{*3}), \\ cov(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*},\varepsilon _{i^{\prime }t}^{*2}\varepsilon _{it}^{*}\varepsilon _{i^{\prime \prime }t}^{*})= & {} E(\varepsilon _{it}^{*3})E(\varepsilon _{i^{\prime }t}^{*3}), \\ cov(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*2},\varepsilon _{it}^{*2}\varepsilon _{i^{\prime \prime }t}^{*2})= & {} E(\varepsilon _{it}^{*4})-1, \\ cov(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime \prime }t}^{*},\varepsilon _{i^{\prime }t}^{*2}\varepsilon _{i^{\prime \prime }t}^{*})= & {} 1, \end{aligned}$$

respectively, which can be consistently estimated from \(\varvec{\varepsilon } _{t}^{*}(\varvec{{\hat{\theta }}}_{T})\) under standard regularity conditions.

Finally, the nonzero covariance terms across the different elements of \( \varvec{m}^{cv}(\varvec{\varepsilon }_{t}^{*})\), \(\varvec{m}^{cs}( \varvec{\varepsilon }_{t}^{*})\) and \(\varvec{m}^{ck}(\varvec{\varepsilon }_{t}^{*})\) are

$$\begin{aligned} cov(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*},\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*})= & {} E(\varepsilon _{it}^{*3}), \\ cov(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*},\varepsilon _{it}^{*3}\varepsilon _{i^{\prime }t}^{*})= & {} E(\varepsilon _{it}^{*4}), \\ cov(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*},\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*2})= & {} E(\varepsilon _{it}^{*3})E(\varepsilon _{i^{\prime }t}^{*3}), \\ cov(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*},\varepsilon _{it}^{*3}\varepsilon _{i^{\prime }t}^{*})= & {} E(\varepsilon _{it}^{*5}), \\ cov(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*},\varepsilon _{i^{\prime }t}^{*3}\varepsilon _{it}^{*})= & {} E(\varepsilon _{it}^{*3})E(\varepsilon _{i^{\prime }t}^{*4}), \text { and} \\ cov(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*},\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*2})= & {} E(\varepsilon _{it}^{*4})E(\varepsilon _{it}^{*3}). \end{aligned}$$

4.2.2 The expected Jacobian

Straightforward calculations allow us to show that the expected Jacobian of the covariances across shocks in (19) will be given by

where \(\varvec{e}_{i}\) is the \(i^{th}\) canonical vector and \(\varvec{c} ^{i.} \) denotes the \(i^{th}\) row of \(\varvec{C}^{-1}\).

Analogously, for the third cross-moments in (20), we will have

while for those in (21) we get

In turn, for the fourth moments in (22), we will have

while for (23) we get


Similarly, the expected Jacobian of (24) involves

Finally, when we look at (25), we unsurprisingly end up with

4.2.3 The covariance with the score

As we have seen before, we need to explicitly compute the expressions in Proposition 3 to obtain (17). Fortunately, some of those expressions simplify considerably for the cross-moments we use to test independence. Intuitively, the reason is that the independence of the shocks implies that when \(\varvec{h}\) is such that \( h_{i}=1\), we will have

$$\begin{aligned} E\left[ \frac{\partial \ln f(\varepsilon _{it}^{*};\varvec{\varrho } _{i\infty })}{\partial \varepsilon _{i}^{*}}\varepsilon _{i^{\prime }t}^{*h_{i^{\prime }}}\varepsilon _{i^{\prime \prime }t}^{*h_{i^{\prime \prime }}}\right] =0 \end{aligned}$$


$$\begin{aligned} E\left[ \frac{\partial \ln f(\varepsilon _{it}^{*};\varvec{\varrho } _{i\infty })}{\partial \varepsilon _{i}^{*}}\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*h_{i^{\prime }}}\varepsilon _{i^{\prime \prime }t}^{*h_{i^{\prime \prime }}}\right] =-E(\varepsilon _{i^{\prime }t}^{*h_{i^{\prime }}})E(\varepsilon _{i^{\prime \prime }t}^{*h_{i^{\prime \prime }}}) \end{aligned}$$

for \(i\ne i^{\prime },i^{\prime \prime }\).

As a result, (17) will be zero for the second moments \( E(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*})\), except for f\(_{\varvec{h}s(i,i^{\prime })}(\varvec{\varrho }_{\infty }, \varvec{\varphi }_{0})\), which will be 1 when \(i^{\prime }\ne i\).

In addition, if we exploit the independence between i and \(i^{\prime }\) and the fact that \(E(\varepsilon _{i^{\prime }t}^{*2})=1\), we can easily prove that the only nonzero covariance elements for the co-skewness influence functions \(E(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*})\) will be

while all of them are zero for \(E(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*})\).

Similarly, we can also prove that for the co-kurtosis influence functions \( E(\varepsilon _{it}^{*2}\varepsilon _{i^{\prime }t}^{*2})\), the only nonzero terms are

In turn, we end up with


for the covariances of the co-kurtosis terms \(E(\varepsilon _{it}^{*3}\varepsilon _{i^{\prime }t}^{*})\) with the scores.

In contrast, the only nonzero covariance of the co-kurtosis influence functions \(E(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*2})\) with the scores will be f\(_{\varvec{h}s(i,i^{\prime })}(\varvec{\varrho }_{\infty },\varvec{ \varphi }_{0})=1\) when \(i^{\prime }\ne i\).

Finally, all the covariances of the scores with \(E(\varepsilon _{it}^{*}\varepsilon _{i^{\prime }t}^{*}\varepsilon _{i^{\prime \prime }t}^{*}\varepsilon _{i^{\prime \prime \prime }t}^{*})\) will be 0 too.

4.3 Combining our tests

      Interestingly, we can use the expressions previously derived to prove that under the joint null hypothesis of mutually independent shocks and the normality of one of them, the two separate tests that we have discussed in Sects. 4.1 and 4.2 are asymptotically independent, so effectively the joint test would simply be the sum of those two components.

In addition, we can also prove that a test that jointly assessed the independence and normality of all the shocks would be asymptotically equivalent under the null to a multivariate Hermite-based test of multivariate normality [see Amengual et al. (2021a)] applied to the reduced form residuals once one eliminates the moment condition related to the covariance of the shocks, whose asymptotic variance when evaluated at the PMLEs would be zero under the null.

5 Monte Carlo analysis

In this section, we assess the finite sample size and power of the normality and independence tests discussed in Sects. 4.1 and 4.2 by means of several Monte Carlo simulation exercises. In addition, we provide some evidence on the effects that dependence across shocks induces on the estimators of the impact multipliers.

5.1 Design and computational details

      For the sake of brevity, we focus on the bivariate case in the main text.Footnote 7 Specifically, we generate samples of size T from the following bivariate static process

$$\begin{aligned} \left( \begin{array}{c} y_{1t} \\ y_{2t} \end{array} \right) =\left( \begin{array}{c} \tau _{1} \\ \tau _{2} \end{array} \right) +\left( \begin{array}{cc} c_{11} &{} c_{12} \\ c_{21} &{} c_{22} \end{array} \right) \left( \begin{array}{c} \varepsilon _{1t}^{*} \\ \varepsilon _{2t}^{*} \end{array} \right) \end{aligned}$$

with \(\tau _{1}=1\), \(\tau _{2}=-1\), \(c_{11}=1\), \(c_{12}=.5\), \(c_{21}=0\) and \( c_{22}=2\). However, our PML estimation procedure does not exploit the restriction that the loading matrix of the shocks is upper triangular. Importantly, given that we can easily prove from (4) that the estimated shocks are numerically invariant to affine transformations of the y’s, and that the same is true of the different test statistics, the results that we report below do not depend on our choice of \(\varvec{\tau }\) and \( \varvec{C}\).

We consider both \(T=250\), which is realistic in most macroeconomic applications with monthly or quarterly data, and \(T=1000\), which is representative of financial applications with daily data. The precise data generating processes (DGPs) that we consider for the shocks are described in Sect. 5.1.2.

5.1.1 Estimation details

To estimate the parameters of the model above, we assume that \( \varepsilon _{1t}^{*}\) and \(\varepsilon _{2t}^{*}\) follow two serially and cross-sectionally independent standardised discrete mixture of two normals, or \(\varepsilon _{it}^{*}\sim DMN(\delta _{i},\varkappa _{i},\lambda _{i})\) for short, so that

$$\begin{aligned} \varepsilon _{it}^{*}=\left\{ \begin{array}{l} N[\mu _{1}^{*}(\varvec{\varrho }_{i}),\sigma _{1}^{*2}(\varvec{ \varrho }_{i})]\text { with probability }\lambda _{i} \\ N[\mu _{2}^{*}(\varvec{\varrho }_{i}),\sigma _{2}^{*2}(\varvec{ \varrho }_{i})]\text { with probability }1-\lambda _{i} \end{array} \right. \end{aligned}$$


$$\begin{aligned} \mu _{1}^{*}(\varvec{\varrho }_{i})= & {} \delta _{i}(1-\lambda _{i}), \\ \mu _{2}^{*}(\varvec{\varrho }_{i})= & {} -\delta _{i}\lambda _{i}, \\ \sigma _{1}^{*2}(\varvec{\varrho }_{i})= & {} \frac{1-\lambda _{i}(1-\lambda _{i})\delta _{i}^{2}}{\lambda _{i}+(1-\lambda _{i})\varkappa _{i}}, \\ \sigma _{2}^{*2}(\varvec{\varrho }_{i})= & {} \varkappa _{i}\sigma _{1}^{*2}(\varvec{\varrho }_{i}), \end{aligned}$$

and \(\varvec{\varrho }_{i}=(\delta _{i},\varkappa _{i},\lambda _{i})^{\prime }\). Hence, we can interpret \(\varkappa _{i}\) as the ratio of the two variances and \(\delta _{i}\) as the parameter that regulates the distance between the means of the two underlying components.Footnote 8

As a consequence, the contribution of observation t to pseudo-log-likelihood function (4) will be

$$\begin{aligned} l[\varepsilon _{it}^{*}(\varvec{\theta });\varvec{\varrho }_{i}]= & {} \ln \{\lambda _{i}\cdot \phi [\varepsilon _{it}^{*}(\varvec{\theta } );\mu _{1}^{*}(\varvec{\varrho }_{i}),\sigma _{1}^{*2}(\varvec{ \varrho }_{i})]\\&+(1-\lambda _{i})\cdot \phi [\varepsilon _{it}^{*}( \varvec{\theta }); \mu _{2}^{*}(\varvec{\varrho }_{i}),\sigma _{2}^{*2}(\varvec{\varrho }_{i})]\}, \end{aligned}$$

where \(\phi (\varepsilon ;\mu ,\sigma ^{2})\) denotes the probability density function of a Gaussian random variable with mean \(\mu \) and variance \(\sigma ^{2}\) evaluated at \(\varepsilon \). Importantly, we maximise the log-likelihood with respect to the two elements of \(\varvec{\tau }\), the four elements of \(\varvec{C}\) and the six shape parameters subject to the nonlinear constraint \(\delta _{i}^{2}<\lambda _{i}^{-1}(1-\lambda _{i})^{-1}\) , which we impose to guarantee the strict positivity of \(\sigma _{1}^{*2}(\varvec{\varrho }_{i})\). Without loss of generality, we also restrict \( \varkappa _{i}\in (0,1]\) as a way of labelling the components, which in turn ensures the strict positivity of \(\sigma _{2}^{*2}(\varvec{\varrho } _{i}) \). Finally, we impose \(\lambda _{i}\in (0,1)\) to avoid degenerate mixtures.Footnote 9

We maximise the log-likelihood subject to these three constraints on the shape parameters using a derivative-based quasi-Newton algorithm, which converges quadratically in the neighbourhood of the optimum. To exploit this property, we start the iterations by obtaining consistent initial estimators of \(\varvec{\tau }\) and \(\varvec{C}\), \(\overline{\varvec{\tau }}_{FICA}\) and \(\overline{\varvec{C}}_{FICA}\) say, using the FastICA algorithm of Gävert, Hurri, Särelä, and Hyvärinen.Footnote 10 In addition, we obtain initial values of the shape parameters of each shock by performing 20 iterationsFootnote 11 of the expectation maximisation (EM) algorithm in Dempster et al. (1977) on each of the elements of \(\overline{\varvec{\varepsilon }}_{t,FICA}^{*}=\overline{\varvec{C}}_{FICA}^{-1}\left( \varvec{y}_{t}-\varvec{\bar{\tau }} _{FICA}\right) \).

As we mentioned in Sect. 2.2, Assumption 1 only guarantees the identification of \(\varvec{C}\) up to sign changes and column permutations. Although in empirical applications a researcher would carefully choose the appropriate ordering and interpretation of the structural shocks, this leeway may have severe consequences when analysing Monte Carlo results. For that reason, we systematically choose a unique global maximum from the different observationally equivalent permutations and sign changes of the columns of the matrix \(\varvec{C}\) using the selection procedure suggested by Ilmonen and Paindaveine (2011) and adopted by Lanne et al. (2017). In addition, we impose that \(diag(\varvec{C)}\) is positive by simply changing the sign of all the elements of the relevant columns. Naturally, we apply the same changes to the shape parameters estimates and the sign of \(\delta _{i}\).

5.1.2 DGPs under the null and the alternative

The four bivariate DGPs for the standardised shocks that we consider under the null of independence are:

dgp 1::

A normal distribution and a discrete mixture of two normals with kurtosis coefficient 4 and skewness coefficients equal to \( -.5 \), i.e. \(\varepsilon _{1t}^{*}\sim N(0,1)\) and \(\varepsilon _{2t}^{*}\sim DMN(-.859,.386,1/5)\).

dgp 1d::

The Var(1) model

$$\begin{aligned} \left( \begin{array}{c} y_{1t} \\ y_{2t} \end{array} \right) =\left( \begin{array}{c} \tau _{1} \\ \tau _{2} \end{array} \right) +\left( \begin{array}{cc} 1/2 &{} 1/4 \\ 0 &{} 1/3 \end{array} \right) \left( \begin{array}{c} y_{1t-1} \\ y_{2t-1} \end{array} \right) +\left( \begin{array}{cc} c_{11} &{} c_{12} \\ c_{21} &{} c_{22} \end{array} \right) \left( \begin{array}{c} \varepsilon _{1t}^{*} \\ \varepsilon _{2t}^{*} \end{array} \right) \end{aligned}$$

with exactly the same shocks and values of \(\varvec{\tau }\) and \(\varvec{C}\) as in dgp 1.Footnote 12

dgp 2::

Independent discrete mixtures of two normals with kurtosis coefficient 4 and skewness coefficients equal to .5 and \(-.5\), respectively. In other words, \(\varepsilon _{1t}^{*}\sim DMN(-.859,.386,1/5)\) and \(\varepsilon _{2t}^{*}\sim DMN(.859,.386,1/5)\).

dgp 3::

A Student t with 10 degrees of freedom (and kurtosis coefficient equal to 4), and an asymmetric t with kurtosis and skewness coefficients equal to 4 and \(-.5\), respectively, so that \(\beta =-1.354\) and \(\nu =18.718\) in the notation of Mencía and Sentana (2012).

Fig. 1
figure 1

Univariate densities of the independent shocks. Notes: dashed lines represent the standard normal distribution. a Plots a standardised discrete mixture of two normals with skewness and kurtosis coefficients of \(-.5\) and 4, respectively (with parameters \( \delta =-.859\), \(\varkappa =.386\) and \(\lambda =1/5\)); b Plots a standardised symmetric Student t with the same kurtosis (i.e. 10 degrees of freedom), while c plots a standardised asymmetric t with skewness and kurtosis as the one in (a) [i.e. with \(\beta =-1.354\) and \(\nu =18.718\), see Mencía and Sentana (2012) for details]

Fig. 2
figure 2

Densities and contours of the bivariate distributions under the alternative hypotheses. Notes: a, b plot a bivariate Student t with 6 degrees of freedom; c, d a standardised bivariate asymmetric t with \( \varvec{\beta }=-5\varvec{\ell }_{N}\) and \(\nu =16\) [see Mencía and Sentana (2012) for details], while e, f plot a standardised mixture of two bivariate normals with joint mixing Bernoulli with \(\lambda =1/5\) and scale parameters \(\varkappa _{1}=.1\) and \(\varkappa _{2}=.2\) [see Sect. 5.1.2 and Lanne and Lütkepohl (2010) for details]

The left panels of Fig. 1a–c display the density functions of these distributions over a range of \(\pm 4\) standard deviations with the standard normal as a benchmark, while the right panels zoom in on the left-tail.

In turn, under the alternative of cross-sectionally dependent shocks we simulate from the following three standardised joint distributions:

dgp 4::

Bivariate Student t with 6 degrees of freedom.

dgp 5::

Bivariate asymmetric t with skewness vector \( \varvec{\beta }=-5\varvec{\ell }_{2}\) and degrees of freedom parameter \( \nu =16\) [see Mencía and Sentana (2012) for details].

dgp 6::

Bivariate mixture of two zero-mean normal vectors with covariance matrices

$$\begin{aligned} \varvec{\Omega }_{1}=\left( \begin{array}{cc} 1/[\lambda +\varkappa _{1}(1-\lambda )] &{} 0 \\ 0 &{} 1/[\lambda +\varkappa _{2}(1-\lambda )] \end{array} \right) , \\ \varvec{\Omega }_{2}=\left( \begin{array}{cc} \varkappa _{1}/[\lambda +\varkappa _{1}(1-\lambda )] &{} 0 \\ 0 &{} \varkappa _{2}/[\lambda +\varkappa _{2}(1-\lambda )] \end{array} \right) , \end{aligned}$$

which we denote by \(DMN_{LL}(\varkappa _{1},\varkappa _{2},\lambda )\) [see Lanne and Lütkepohl (2010) for details]. Specifically, we set \(\varkappa _{1}=0.1\), \(\varkappa _{2}=0.2\) and \(\lambda =1/5\).

The left panels of Fig. 2 display the joint densities for these distributions, while their contours are presented in the right panels.

To gauge the finite sample size and power of our proposed independence tests, we generate 20, 000 samples for each of the designs under the null and 5000 for those under the alternative. Additionally, we evaluate the small sample size and power of the normality tests presented in Sect. 4.1 using the results from the simulation designs dgp 1 and 1d (null), and dgp 2 and dgp 3 (alternative).

5.1.3 Bootstrap procedures

The theoretical results in Beran (1988) imply that if the usual Gaussian asymptotic approximation provides a reliable guide to the finite sample distribution of the sample version of the moments being tested, the bootstrapped critical values should not only be valid, but also their errors should be of a lower order of magnitude under additional regularity conditions that guarantee the validity of a higher-order Edgeworth expansion.Footnote 13 For that reason, we also analyse the performance of applying the bootstrap to the testing procedures we have described in Sects. 4.1 and 4.2.

In the case of our tests for independence, for each Monte Carlo sample, we can easily generate another \(N_{boot}\) bootstrap samples of size T that impose the null with probability approaching 1 as T increases as follows.Footnote 14 First, we generate NT draws \(R_{is}\) from a discrete uniform distribution between 1 and T, which we then use to construct

$$\begin{aligned} \varvec{\tilde{y}}_{s}={\varvec{\hat{\tau }}}_{T}+\varvec{\hat{C}}_{T}\varvec{ \tilde{\varepsilon }}_{s}^{*}, \end{aligned}$$

where \(\tilde{\varepsilon }_{is}^{*}={\hat{\varepsilon }}_{iR_{is}}^{*}\) and \(\varvec{{\hat{\varepsilon }}}_{t}^{*}=\varvec{\varepsilon }_{t}^{*}(\varvec{{\hat{\theta }}}_{T})=\varvec{\hat{C}}_{T}^{-1}\left( \varvec{y}_{t}- \varvec{{\hat{\tau }}}_{T}\right) \) are the estimated residuals in any given sample.

As for the normality tests, whose null hypothesis is that a single shock \( \varepsilon _{it}^{*}\) is Gaussian, we adopt a partially parametric resampling scheme in which the draws of the \(i^{th}\) shock \(\tilde{ \varepsilon }_{is}^{*}\) are independently simulated from a N(0, 1) distribution, while the draws for the remaining shocks \(\tilde{\varepsilon } _{ks}^{*}\) \((k\ne i)\) are obtained nonparametrically as in the previous paragraph.

Although these bootstrap procedures are simple and fast for any given sample, they quickly become prohibitively expensive in a Monte Carlo exercise as T increases. For this reason, for the designs with \(T=1000\) we rely on the warp-speed method of Giacomini et al. (2013).

5.2 Simulation results

5.2.1 Testing normality

Table 1 reports Monte Carlo rejection rates of the normality tests proposed in Sect. 4.1 for dgp 1, 1d, 2 and 3. As can be seen, the null of normality is correctly rejected a large number of times when it does not hold, even in samples of length 250. The only possible exception is the skewness component of the Jarque-Bera test when applied to the symmetric Student t shock in dgp 3. Given that the population third moment is zero in this case, the only source of power is the fact that the sample variability of \(H_{3}\) is larger for this shock than its theoretical value under Gaussianity.

On the other hand, the first three rows of the panels dgp 1 and 1d, which are the ones with a Gaussian shock, show that the normality tests tend to be oversized at the usual nominal levels, especially for samples of length 250.Footnote 15 For that reason, we generate \(N_{Boot}=399\) bootstrap samples at each Monte Carlo replication, as described in Sect. 5.1.3. Table 2 shows that the standard bootstrap version of our tests is pretty accurate for both the third and fourth moment tests. Unlike what we observed in Table 1, though, the size-adjusted power is slightly lower for dgp 1d than for dgp 1.

However, as mentioned at the end of Sect. 4.1, researches may only get a reliable guide to the validity of Assumption 1 by looking at the normality tests for all the individual shocks, the objective being to get at least \(N-1\) rejections. To shed some light on this issue, in Table 3 we report contingency tables which fully characterise the extent to which simultaneous rejections of the individual normality tests occur. As can be seen, our proposed normality tests tend to be rather informative when used in this way.

Table 1 Monte Carlo size and power of normality tests
Table 2 Monte Carlo size and power of normality tests with bootstrap: sample size \(T=250\)
Table 3 Contingency tables of the normality test based on \(H_{3}(\varepsilon ^{*}_{it})\) & \(H_{4}(\varepsilon ^{*}_{it})\)
Table 4 Monte Carlo size of independence moment tests: sample size \(T=250\)
Table 5 Monte Carlo size of independence moment tests: sample size \(T=1000\)
Table 6 Monte Carlo power of independence moment tests: sample size \(T=250\)
Table 7 Monte Carlo power of independence moment tests: sample size \(T=1000\)

5.2.2 Testing independence

In Tables 4 (\(T=250\)) and 5 (\(T=1000\)) we report the Monte Carlo rejection rates of the tests we have proposed in Sect. 4.2 under the null of independence. Specifically, we look at the second, third and fourth moment individual tests in \(\varvec{m} ^{cv}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]\), \(\varvec{m} ^{cs}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]\) and \(\varvec{m} ^{ck}[\varvec{\varepsilon }_{t}^{*}(\varvec{\theta })]\), and also at the joint tests for the two co-skewness moments, the three co-kurtosis moments and the combined six moments, including the correlation between the shocks. The left panels of those tables report rejection rates using asymptotic critical values, while the right panels show the bootstrap-based ones for \( T=250\) and the warp-speed bootstrap-based ones for \(T=1000\).Footnote 16

We can see in Table 4 some small to moderate finite sample size distortion when \(T=250\), although in several cases they are corrected by the bootstrap. The only exceptions seem to be dgp 1 and 1d, in which some small distortions remain even with this procedure. Given that in these designs there is only one non-Gaussian shock, a plausible explanation is that the identification of \(\varvec{C}\) may be weaker, a conjecture we will revisit in the next section. For the other DGPs, the results in Table 4 clearly show that the usual bootstrap version of the tests, which is the relevant one in empirical applications, has much better size properties.

As can be seen in Table 5, finite sample sizes improve considerably for \( T=1000\). Indeed, the bootstrap versions of the tests seem unnecessary for this sample size because the empirical rejection rates based on asymptotic critical values become generally very close to the nominal ones, though the warp-speed version performs comparably well.

Next, we assess the power of the independence tests for \(T=250\) and \(T=1000\) in Tables 6 and 7, respectively. In this respect, we find that the power of our tests against dgp 4 is disappointingly low. A possible explanation is that when the true joint distribution is a symmetric Student t, the dependence between the components is mostly visible in the tails of the distribution. On the other hand, power is mostly coming from the co-skewness component (20) in the case of the joint asymmetric t. Still, the test based on the covariance of shocks (19) is also very powerful. Finally, the co-kurtosis test based on (22) is the most powerful single moment test under the Lanne and Lütkepohl (2010) alternative in dgp 6, with the joint tests that include this moment inheriting its power. Nevertheless, the test based on second moment (19) also has non-negligible power for this design.

In summary, although the rejection rates naturally depend on the type of departure from the null and the specific influence function used for testing, the joint test that considers all moments at once seems to be a winner regardless of the sample size.

Table 8 Monte Carlo distribution of parameter estimators

5.3 Structural parameters estimates

Table 8 reports summary statistics for the Monte Carlo distribution of the PMLEs of the structural parameters. The first thing we would like to highlight is when one of the shocks is Gaussian, the sampling variability and the finite sample bias are noticeably larger than when both shocks are non-Gaussian but independent, which is in line with the conjecture we expressed in the previous section. Still, even in that case the biases are usually small and often negligible. In addition, the Monte Carlo standard deviations of the estimators in Panel B are roughly half those in Panel A, as one would expect.

The situation is completely different when the true shocks are cross-sectionally dependent. Failure of condition 2 in Assumption 1 results into significant biases, mostly in the off-diagonal terms of the impact multiplier matrix. In fact, the Monte Carlo variance of these estimators seems to increase with the sample size. In this respect, it is important to remember that the elements of the \(\varvec{C}\) matrix are no longer point identified when the joint distribution of the true shocks is either a symmetric or asymmetric Student t. This is confirmed by the fact that the bias of the estimators is lower for dgp 6, in which the rotations of the shocks are not observationally equivalent [see Lanne and Lütkepohl (2010)].

6 Conclusions and directions for further research

Given that the parametric identification of the structural shocks and their impact coefficients \(\varvec{C}\) in the Svar model (2) critically hinges on the validity of the identifying restrictions in Assumption 1, it would be desirable that empirical researchers estimating those models reported specification tests that checked those assumptions to increase the empirical credibility of their findings. For that reason, in this paper we propose simple specification tests for independent component analysis and structural vector autoregressions with non-Gaussian shocks that check the normality of a single shock and the potential cross-sectional dependence among several of them. Our tests compare the integer (product) moments of the shocks in the sample with their population counterparts. Importantly, we explicitly consider the sampling variability resulting from using shocks computed with consistent parameter estimators. We study the finite sample size of our tests in several simulation exercises and discuss some bootstrap procedures. We also show that our tests have non-negligible power against a variety of empirically plausible alternatives.

As we mentioned in introduction, there are many estimators for the parameters of the static Ica model (1) in addition to the discrete mixture of normals-based PMLEs we have considered in this paper. For example, even within the same likelihood framework, Fiorentini and Sentana (2020) discuss two other consistent estimators of the conditional mean and variance parameters of the Svar in (2):

  1. 1.

    The two-step procedure of Gouriéroux et al. (2017), which first estimates the reduced form parameters \(\varvec{\tau }\), \(\varvec{ a}\) and \(\varvec{\sigma }_{L}=vec(\varvec{\Sigma }_{L})\) by equation-by-equation OLS, and then the \(N(N-1)/2\) free elements \(\varvec{ \omega }\) of the orthogonal rotation matrix \(\varvec{Q}\) in (3) mapping structural shocks and reduced form innovations by non-Gaussian PML.

  2. 2.

    The two-step estimator a la Fiorentini and Sentana (2019), which replaces the inconsistent non-Gaussian PMLEs of \(\varvec{\tau }\) and \( \varvec{\psi }\) by the sample means and standard deviations of pseudo-standardised shocks computed using \(\varvec{\hat{a}}_{T}\) and \(\varvec{\hat{ \jmath }}_{T}\).

Although the specifications tests that we have proposed in this paper could also be applied to shocks computed on the basis of these alternative estimators, the asymptotic covariance matrices that take into account their sampling variability will differ from the ones we have derived in this paper. Given that some researchers may prefer to use one of those two-step estimation methods, obtaining computationally simple expressions for the adjusted covariance matrix would provide a valuable addition to our results.

In fact, the moment conditions that we consider for testing independence could form the basis of a GMM estimation procedure for the model parameters \( \varvec{\theta }\) along the lines of Lanne and Luoto (2021), although with a larger set of third and fourth cross-moments. The overidentification restrictions tests obtained as a by-product of this procedure could be used as a specification test of the assumed independence-like restrictions.

Our tests for normality tackle a single shock at a time. Although we could in principle simultaneously test the normality of two or more shocks by combining the corresponding normality tests, the implicit joint null hypothesis would violate the second identification condition in Assumption 1. The asymptotic distribution of such joint tests constitutes a very interesting topic for further research. In addition, we could formally study the limiting probability of finding \(N-1\) rejections of the univariate normality tests in those circumstances.

Another important research topic would be the limiting behaviour of the PMLEs of \(\varvec{\theta }\) when Assumption 1 does not hold, either because two or more of the shocks are Gaussian or because they are not independent.

Finally, while the integer product moment tests for independence that we have considered are very intuitive, they may have little power against alternatives in which the dependence is mostly visible in certain regions of the domain of the random shocks. With this in mind, in Amengual et al. (2021b) we study moment tests that look at the product of nonlinear transformations of the shocks, such as \(I(q_{\alpha i}\le \varepsilon _{it}\le q_{\omega i})\), where \(q_{\alpha i}\) and \(q_{\omega i}\) are the \(\alpha \) and \(\omega \) quantiles of the marginal distribution of the \(i^{th}\) shock (with \(0\le \alpha <\omega \le 1\)), or \(I(k_{li}\le \varepsilon _{it}\le k_{ui})\), where \(k_{li}<k_{ui}\) are some fixed values, or indeed \(\varepsilon _{it}I(k_{li}\le \varepsilon _{it}\le k_{ui})\). Extending this approach in such a way that it leads to a consistent test of independence constitutes another promising research avenue.