Moment tests of independent components

We propose simple specification tests for independent component analysis and structural vector autoregressions with non-Gaussian shocks that check the normality of a single shock and the potential cross-sectional dependence among several of them. Our tests compare the integer (product) moments of the shocks in the sample with their population counterparts. Importantly, we explicitly consider the sampling variability resulting from using shocks computed with consistent parameter estimators. We study the finite sample size of our tests in several simulation exercises and discuss some bootstrap procedures. We also show that our tests have non-negligible power against a variety of empirically plausible alternatives.


Introduction
The literature on structural vector autoregressions (Svar) is vast. Popular identification schemes include short-and long-run homogenous restrictions [see, e.g. Sims (1980), Blanchard and Quah (1989)], sign restrictions [see, e.g. Faust (1998), Uhlig (2005)], time-varying heteroskedasticity (Sentana and Fiorentini 2001) or external instruments [see, e.g. Mertens and Ravn (2012), Stock and Watson (2018) or Dolado et al. (2020)]. Recently, identification through independent non-Gaussian shocks has become increasingly popular after Lanne et al. (2017) and Gouriéroux et al. (2017). The signal processing literature on Independent Component Analysis (Ica) popularised by Comon (1994) shares the same identification scheme. Specifically, if in a static model the N × 1 observed random vector y-the so-called signals or sensors-is the result of an affine combination of N unobserved shocks ε * -the so-called components or sources-whose mean and variance we can set to 0 and I N without loss of generality, namely then the matrix C of loadings of the observed variables on the latent ones can be identified (up to column permutations and sign changes) from an i.i.d. sample of observations on y provided the following assumption holds: 1
Failure of any of the three conditions in Assumption 1 results in an underidentified model. The best known counterexample is a multivariate Gaussian model for ε * , in which we can identify V ( y) = CC but not C without additional structural restrictions despite the fact that the elements of ε * are cross-sectionally independent. Intuitively, the problem is that any rotation of the structural shocks ε * * = Qε * , where Q is an orthogonal matrix, generates another set of N observationally equivalent, crosssectionally independent shocks with standard normal marginal distributions. A less well-known counterexample would be a non-Gaussian spherical distribution for ε * , such as the standardised multivariate Student t. In this case, the lack of identifiability of C is due to the fact that ε * and ε * * share not only their mean vector (0) and covariance matrix (I), but also the same nonlinear dependence structure.
The purpose of our paper is to propose simple to implement and interpret specification tests that check the normality of a single element of ε * and the potential cross-sectional dependence among several of them. In very simple terms, our tests compare the integer (product) moments of the shocks in the sample with their population counterparts. Specifically, in the Gaussian tests we compare the marginal third and fourth moments of a single shock to 0 and 3, respectively. In turn, in the case of two or more shocks, we assess the statistical significance of their second, third and fourth cross-moments, which should be equal to the product of the corresponding marginal moments under independence. Many of these moments tests can be formally justified as Lagrange multiplier tests against specific parametric alternatives [see, e.g. Mencía and Sentana (2012)], but in this paper we do not pursue this interpretation. Like Almuzara et al. (2019), though, we focus on the latent shocks rather than the observed variables in view of the fact that identifying Assumption 1 is written in terms of ε * rather than y.
If we knew the true values of μ and C, μ 0 and C 0 say, with rank(C 0 ) = N , our tests would be straightforward, as we could trivially recover the latent shocks from the observed signals without error. In practice, though, both μ and C are unknown, so we need to estimate them before computing our tests.
Although many estimation procedures for those parameters have been proposed in the literature [see, e.g. Moneta and Pallante (2020) and the references therein], in this paper we consider the discrete mixtures of normals-based pseudo-maximum likelihood estimators (PMLEs) in Fiorentini and Sentana (2020) for three main reasons. First, they are consistent for the model parameters under standard regularity conditions provided that Assumption 1 holds regardless of the true marginal distributions of the shocks. Second, they seem to be rather efficient, the rationale being that finite normal mixtures can provide good approximations to many univariate distributions. And third, the influence functions on which they are based are the scores of the pseudolog-likelihood, which we can easily compute in closed form. As we shall see, these influence functions play a very important role in adjusting the asymptotic variances of the different tests we propose so that they reflect the sampling variability resulting from computing the shocks with consistent but noisy parameter estimators.
In this respect, we derive computationally simple closed-form expressions for the asymptotic covariance matrices of the sample moments underlying our tests under the relevant null adjusted for parameter uncertainty. Importantly, we do so not only for static Ica model (1) but also for a Svar, which is far more relevant in economics.
In many empirical finance applications of Svars, the number of observations is sufficiently large for asymptotic approximations to be reliable. In contrast, the limiting distributions of our tests may be a poor guide for the smaller samples typically used in macroeconomic applications. For that reason, we thoroughly study the finite sample size of our tests in several Monte Carlo exercises. We also discuss some bootstrap procedures that seem to improve their reliability. Finally, we show that our tests have non-negligible power against a variety of empirically plausible alternatives in which the cross-sectional independence of the shocks no longer holds.
The rest of the paper is organised as follows. Section 2 discusses the model and the estimation procedure. Then, we present our general moment tests in Sect. 3 and particularise them to assess normality and independence in Sect. 4. Next, Sect. 5 contains the results of our Monte Carlo experiments. We present our conclusions and suggestions for further research in Sect. 6 and relegate some technical material and additional simulations to several appendices.

Model specification
Consider the following N -variate Svar process of order p: where I t−1 is the information set, C the matrix of impact multipliers and ε * t the "structural" shocks, which are normalised to have zero means, unit variances and zero covariances.
Let ε t = Cε * t denote the reduced form innovations, so that ε t |I t−1 ∼ i.i.d. (0, ) with = CC . As we mentioned in introduction, a Gaussian (pseudo) log-likelihood is only able to identify , which means the structural shocks ε * t and their loadings in C are only identified up to an orthogonal transformation. Specifically, we can use the so-called L Q matrix decomposition 2 to relate the matrix C to the Cholesky decomposition of = L L as where Q is an N × N orthogonal matrix, which we can model as a function of N (N − 1)/2 parameters ω by assuming that | Q| = 1. 3 Notice that if | Q| = −1 instead, we can change the sign of the i th structural shock and its impact multipliers in the i th column of the matrix C without loss of generality as long as we also modify the shape parameters of the distribution of ε * it to alter the sign of all its nonzero odd moments.
In this context, Lanne et al. (2017) show that statistical identification of both the structural shocks and C (up to column permutations and sign changes) is possible under Ica identification Assumption 1, which we maintain in what follows. Popular examples of univariate non-normal distributions are the Student t and the generalised error (or Gaussian) distribution, which includes normal, Laplace and uniform as special cases, as well as symmetric and asymmetric finite normal mixtures. τ , a , c ), denote the structural parameters characterising the first two conditional moments of 2 The L Q decomposition is intimately related to the Q R decomposition. Specifically, Q L provides the Q R decomposition of the matrix C , which is uniquely defined if we restrict the diagonal elements of L to be positive [see, e.g. Golub and van Loan (2013) for further details]. 3 See section 10 of Magnus et al. (2021) for a detailed discussion of three ways of explicitly parametrising a rotation (or special orthogonal) matrix: (i) as the product of Givens matrices that depend on N (N − 1)/2 Tait-Bryan angles, one for each of the strict upper diagonal elements; (ii) by using the so-called Cayley transform of a skew-symmetric matrix; and (iii) by exponentiating a skew-symmetric matrix. y t . In addition, we assume ε * it |I t−1 ∼ i.i.d. D(0, 1, i ), where i is a q i × 1 vector of variation-free shape parameters, so that in principle different shocks could follow different distributions. For simplicity of notation, though, we maintain that the univariate distributions of the shocks belong to the same family. We can then collect all the shape parameters in the q × 1 vector = ( 1 , . . . , N ) , with q = N i=1 q i , so that φ = (θ , ) is the [N + ( p + 1)N 2 + q] × 1 vector containing all the model parameters.

The criterion function
Given the linear mapping between structural shocks and reduced form innovations, the contribution to the conditional log-likelihood function from observation y t (t = 1, . . . , T ) for those parameter configurations for which C has full rank will be given by where f [ε * it (θ); i ] is the univariate log-likelihood function for the i th structural shock,

The score vector
Let s t (φ) denote the score function ∂l t (φ)/∂φ and partition it into two blocks, s θ t (φ) and s t (φ), whose dimensions conform to those of θ and , respectively. Fiorentini and Sentana (2021) show that the scores can be written as where and by virtue of the cross-sectional independence of the shocks, so that the derivatives involved correspond to the underlying univariate densities.

The asymptotic distribution
For simplicity, we assume henceforth that Svar model (2) generates a covariance stationary process. 4 Consider the reparametrisation C = J , where is a diagonal matrix whose elements contain the scale of the structural shocks, while the columns of J, whose diagonal elements are normalised to 1, measure the relative impact of each of the structural shocks on all the remaining variables. Proposition 3 in Fiorentini and Sentana (2020) shows that the parameters a i = vec( A i ) and j = veco( J) are consistently estimated regardless of the true distribution. 5 As a result, the pseudotrue values of those parameters will coincide with the true ones, i.e. a i∞ = a i0 and j ∞ = j 0 . In contrast, τ and ψ = vecd( ) will generally be inconsistently estimated, so τ ∞ = τ 0 and ψ ∞ = ψ 0 . Nevertheless, Fiorentini and Sentana (2020) prove that the unrestricted PMLEs of τ and ψ which simultaneously estimate will be consistent too when the univariate distributions used for estimation purposes are discrete mixtures of normals, in which case θ ∞ = θ 0 and ε * t (θ 0 ) = ε * t . For that reason, in what follows we focus on the finite normal mixtures-based PMLEs of the original parameters θ = (τ , a , c ) .
Still, the potential misspecification of this distributional assumption implies that the asymptotic covariance matrix of the corresponding PMLEs must be based on the usual sandwich formula. Let and denote the (−) expected value of the log-likelihood Hessian and the variance of the score, respectively, where ∞ are the pseudo-true values of the shape parameters of the distributions of the shocks assumed for estimation purposes, υ contains the potentially infinite-dimensional shape parameters of the true distributions of the shocks, and ϕ = (θ , υ ) . The asymptotic distribution of the pseudo-ML estimators of φ ,φ T , under standard regularity conditions will be given by In what follows, we shall make extensive use of the detailed expressions for the conditional expected value of the Hessian and covariance matrix of the score for finite normal mixtures-based PMLEs in Amengual et al. (2021b).

The influence functions
As we have stressed earlier, the parametric identification of the structural shocks ε * t (θ) and their impact coefficients C that appear in the Svar model (2) critically hinges on the validity of identifying Assumption 1. As a consequence, it would be desirable that empirical researchers estimating those models reported specification tests that would check those assumptions. Given that rank failures in C will be inextricably linked to singular dynamic systems, 6 we focus on testing that at most one of the structural shocks is Gaussian and that all the structural shocks are indeed independent of each other.
As is well known, stochastic independence between the elements of a random vector is equivalent to the joint distribution being the product of the marginal ones. In turn, this factorisation implies lack of correlation between not only the levels but also any set of single-variable measurable transformations of those elements. Thus, a rather intuitive way of testing for independence without considering any specific parametric alternative can be based on individual moment conditions of the form where h = {h 1 , ..., h N }, with h i ∈ Z 0+ , denotes the index vector characterising a specific product moment. While the influence function in (14) will generally require the estimation of E[ε * h i it (θ 0 )] for some of the shocks, the constant term N i=1 E[ε * h i it (θ 0 )] is either 0 or 1 for the second, third and fourth cross-moments we study in this paper in view of the standardised nature of the shocks, so we do not need to worry about it. Amengual et al. (2021b) discuss in detail how to deal with the estimation of the required E[ε * h i it (θ 0 )] in the general case. Although we have motivated (14) as the basis for our tests of independence, by setting all the elements of h but one to 0, we can also use this expression to look at the marginal moments of a single shock. In this paper, we focus on h i = 3 and 4 because most common departures from normality of the shocks will be reflected in coefficients of skewness or kurtosis different from 0 and 3, respectively. (14) for different index vectors h 1 , . . . , h k , , . . . , h K . The following result, which specialises the general expressions in Newey (1985) and Tauchen (1985) to our context, derives the asymptotic distribution of the scaled sample average of m[ε * t (θ)] when we evaluate the structural shocks at the PMLEsθ T rather than at θ 0 :
In the next subsections, we provide detailed expressions for V(φ; ϕ), J (φ; ϕ) and F(φ; ϕ) which exploit that the true shocks are cross-sectionally and serially independent under the null hypothesis of correct specification of the static Ica model (1) or the dynamic Svar model (2).

Covariance across influence functions
If we exploit the cross-sectional independence of the shocks under the null hypothesis, which implies that at the true values

The expected Jacobian
Straightforward application of the chain rule implies that On this basis, the following proposition characterises the expected Jacobian matrix for any h: Proposition 2 Suppose that model (2) satisfies Assumption 1. Then, the expected Jacobian matrix of m h [ε * t (θ )] evaluated at the true values is given by As for ∂m h [ε * t (θ)]/∂ε * , if we denote all the distinct second, third and fourth moments by where D N , T N and Q N are the duplication, triplication and quadruplication matrices, respectively [see Meijer (2005) for details], the results we derive in Appendix B.1 provide an easy way to compute all those derivatives recursively.

The covariance with the score
Let N denote a vector of N ones and I (.) the usual indicator function. The following proposition provides the last ingredient of the adjusted covariance matrix in Proposition 1.
Proposition 3 Suppose that model (2) satisfies Assumption 1. Then, the covariance between the influence function m h (·) and the pseudo-log-likelihood scores evaluated at the (pseudo) true values is given by where F hl ( ∞ , ϕ 0 ) is a 1 × N vector whose entries are such that for any i with h i > 0, and zero otherwise, F hs ( ∞ , ϕ 0 ) is a 1 × N 2 vector whose entries are such that for any i with h i > 0 and i with h i > 0 and zero otherwise, and finally and zero otherwise.

Testing normality
As we have mentioned before, we can use (14) to test the null hypothesis that a single structural shock is Gaussian by comparing its third and fourth sample moments with 0 and 3, respectively, which are the population values of those moments under the null of normality. Nevertheless, many authors [see, e.g. Bontemps and Meddahi (2005) and the references therein] convincingly argue that it is generally more appropriate to look at the sample averages of the third and fourth Hermite polynomials instead. In particular, one should consider it and ε * 4 it only. The reason is that Hermite polynomials have two main advantages. First, given that the results in Proposition 2 immediately imply that their expected Jacobians will be 0 under the null of normality, so they are immune to the sampling uncertainty resulting from using estimated shocks. Second, H 3 (ε * it ) and H 4 (ε * it ) are orthogonal under the Gaussian null, which means that the joint test is simply the sum of two asymptotically independent components: one for skewness and another one for kurtosis.
The properties of the estimators that we use, though, mean that the usual implementation of the Jarque and Bera (1980) test, which simply looks at the sample averages of ε * 3 it (θ T ) and ε * 4 it (θ T ), yields numerically the same statistics as the tests based on the Hermite polynomials despite the fact that it ignores the terms involving ε * it and ε * 2 it .
The intuition is as follows. Proposition 1 in Fiorentini and Sentana (2020) states that the PMLEs of the unconditional mean and variance of a univariate finite mixture of normals numerically coincide with the sample mean and variance (with denominator T ) of the observed series. Given that log-likelihood function (4) for any given values of a and j is effectively the sum of N such univariate log-likelihoods with parameters that are variation-free, the estimated shocks will be such that regardless of the sample size. This property also has interesting implications for the independence tests that we will consider in the next section because, in effect, each estimated shock will be standardised in the sample. Finally, it is important to emphasise that the non-normality of a single shock does not guarantee the identification of the model parameters, in the same way as its normality does not imply they are underidentified. As we shall see in the Monte Carlo section, though, researchers can get an informative guide to the validity of Assumption 1 by looking at the normality tests for all the individual shocks.

Testing independence
At first sight, the arguments in the previous section might suggest that the sample covariances between the estimated shocks will also be 0 by construction. However, this is not generally true. The finite normal mixture PMLEs guarantee the univariate standardisation of each shock, but it does not imply their orthogonality in any given sample, unlike what would happen with a multivariate Gaussian likelihood function in which enough a priori restrictions were imposed on C to render the model exactly identified. Intuitively, the parameter values that maximise (4) are trying to make the estimated shocks stochastically independent, not merely orthogonal [see Herwartz (2018)].
For that reason, the first test for independence that we consider will be based on the second cross-moment condition In other words, we are simply assessing if the sample correlation between the i th and i th estimated shocks is significantly different from zero in the usual statistical sense. Nevertheless, we can also go beyond linear dependence and look at moments that characterise the co-skewness across the structural shocks. These can be of two types: and depending on whether they involve two or three different shocks. Finally, we can also look at the different co-kurtosis among the shocks, which may involve a pair of shocks, namely and three shocks and even four shocks Thus, we substantially expand the set of moments researchers can use to test for the independence of the components relative to Hyvärinen (2013), who only suggested looking at the co-kurtosis terms in (22). The above moment conditions also augment those considered by Lanne and Luoto (2021), who focus on (19), (22) and (23), together with E(ε * it ) = 0 and E(ε * 2 it ) = 1.

Covariance across influence functions
Next, we derive in detail the nonzero elements of the covariance matrix of the second, third and fourth moments in (16). It is easy to see that under the null hypothesis of independence, the only nonzero elements of the covariance matrix of and respectively, which can be consistently estimated from ε * t (θ T ) under standard regularity conditions. Finally, the nonzero covariance terms across the different elements of

The expected Jacobian
Straightforward calculations allow us to show that the expected Jacobian of the covariances across shocks in (19) will be given by where e i is the i th canonical vector and c i. denotes the i th row of C −1 . Analogously, for the third cross-moments in (20), we will have while for those in (21) we get In turn, for the fourth moments in (22), we will have while for (23) we get Similarly, the expected Jacobian of (24) involves Finally, when we look at (25), we unsurprisingly end up with

The covariance with the score
As we have seen before, we need to explicitly compute the expressions in Proposition 3 to obtain (17). Fortunately, some of those expressions simplify considerably for the cross-moments we use to test independence. Intuitively, the reason is that the independence of the shocks implies that when h is such that h i = 1, we will have As a result, (17) will be zero for the second moments E(ε * it ε * i t ), except for f hs(i,i ) ( ∞ , ϕ 0 ), which will be 1 when i = i.
In addition, if we exploit the independence between i and i and the fact that E(ε * 2 i t ) = 1, we can easily prove that the only nonzero covariance elements for the Similarly, we can also prove that for the co-kurtosis influence functions E(ε * 2 it ε * 2 i t ), the only nonzero terms are In turn, we end up with for the covariances of the co-kurtosis terms E(ε * 3 it ε * i t ) with the scores. In contrast, the only nonzero covariance of the co-kurtosis influence functions with the scores will be f hs(i,i ) ( ∞ , ϕ 0 ) = 1 when i = i. Finally, all the covariances of the scores with E(ε * it ε * i t ε * i t ε * i t ) will be 0 too.

Combining our tests
Interestingly, we can use the expressions previously derived to prove that under the joint null hypothesis of mutually independent shocks and the normality of one of them, the two separate tests that we have discussed in Sects. 4.1 and 4.2 are asymptotically independent, so effectively the joint test would simply be the sum of those two components.
In addition, we can also prove that a test that jointly assessed the independence and normality of all the shocks would be asymptotically equivalent under the null to a multivariate Hermite-based test of multivariate normality [see Amengual et al. (2021a)] applied to the reduced form residuals once one eliminates the moment condition related to the covariance of the shocks, whose asymptotic variance when evaluated at the PMLEs would be zero under the null.

Monte Carlo analysis
In this section, we assess the finite sample size and power of the normality and independence tests discussed in Sects. 4.1 and 4.2 by means of several Monte Carlo simulation exercises. In addition, we provide some evidence on the effects that dependence across shocks induces on the estimators of the impact multipliers.

Design and computational details
For the sake of brevity, we focus on the bivariate case in the main text. 7 Specifically, we generate samples of size T from the following bivariate static process with τ 1 = 1, τ 2 = −1, c 11 = 1, c 12 = .5, c 21 = 0 and c 22 = 2. However, our PML estimation procedure does not exploit the restriction that the loading matrix of the shocks is upper triangular. Importantly, given that we can easily prove from (4) that the estimated shocks are numerically invariant to affine transformations of the y's, and that the same is true of the different test statistics, the results that we report below do not depend on our choice of τ and C.
We consider both T = 250, which is realistic in most macroeconomic applications with monthly or quarterly data, and T = 1000, which is representative of financial applications with daily data. The precise data generating processes (DGPs) that we consider for the shocks are described in Sect. 5.1.2.

Estimation details
To estimate the parameters of the model above, we assume that ε * 1t and ε * 2t follow two serially and cross-sectionally independent standardised discrete mixture of two normals, or ε Hence, we can interpret κ i as the ratio of the two variances and δ i as the parameter that regulates the distance between the means of the two underlying components. 8 As a consequence, the contribution of observation t to pseudo-log-likelihood function (4) will be where φ(ε; μ, σ 2 ) denotes the probability density function of a Gaussian random variable with mean μ and variance σ 2 evaluated at ε. Importantly, we maximise the log-likelihood with respect to the two elements of τ , the four elements of C and the six shape parameters subject to the nonlinear constraint δ 2 i < λ −1 i (1 − λ i ) −1 , which we impose to guarantee the strict positivity of σ * 2 1 ( i ). Without loss of generality, we also restrict κ i ∈ (0, 1] as a way of labelling the components, which in turn ensures the strict positivity of σ * 2 2 ( i ). Finally, we impose λ i ∈ (0, 1) to avoid degenerate mixtures. 9 We maximise the log-likelihood subject to these three constraints on the shape parameters using a derivative-based quasi-Newton algorithm, which converges quadratically in the neighbourhood of the optimum. To exploit this property, we start the iterations by obtaining consistent initial estimators of τ and C, τ F I C A and C F I C A say, using the FastICA algorithm of Gävert, Hurri, Särelä, and Hyvärinen. 10 In addi- 8 We can trivially extend this procedure to three or more components if we replace the normal random variable in the first branch of (27) by a k-component normal mixture with mean and variance given by μ * 1 ( ) and σ * 2 1 ( ), respectively, so that the resulting random variable will be a (k+1) -component Gaussian mixture with zero mean and unit variance. 9 Specifically, we impose κ i ∈ [κ, 1] with κ = .0001, and λ i ∈ [λ, λ] with λ = 2/T and λ = 1 − 2/T . 10 See Hyvärinen (1999) and https://research.ics.aalto.fi/ica/fastica/ for details on the FastICA package. tion, we obtain initial values of the shape parameters of each shock by performing 20 iterations 11 of the expectation maximisation (EM) algorithm in Dempster et al. (1977) on each of the elements of ε * t,F I C A = C −1 F I C A y t −τ F I C A . As we mentioned in Sect. 2.2, Assumption 1 only guarantees the identification of C up to sign changes and column permutations. Although in empirical applications a researcher would carefully choose the appropriate ordering and interpretation of the structural shocks, this leeway may have severe consequences when analysing Monte Carlo results. For that reason, we systematically choose a unique global maximum from the different observationally equivalent permutations and sign changes of the columns of the matrix C using the selection procedure suggested by Ilmonen and Paindaveine (2011) and adopted by Lanne et al. (2017). In addition, we impose that diag(C) is positive by simply changing the sign of all the elements of the relevant columns. Naturally, we apply the same changes to the shape parameters estimates and the sign of δ i .
The left panels of Fig. 1a-c display the density functions of these distributions over a range of ±4 standard deviations with the standard normal as a benchmark, while the right panels zoom in on the left-tail.
In turn, under the alternative of cross-sectionally dependent shocks we simulate from the following three standardised joint distributions: Fig. 1 Univariate densities of the independent shocks. Notes: dashed lines represent the standard normal distribution. a Plots a standardised discrete mixture of two normals with skewness and kurtosis coefficients of −.5 and 4, respectively (with parameters δ = −.859, κ = .386 and λ = 1/5); b Plots a standardised symmetric Student t with the same kurtosis (i.e. 10 degrees of freedom), while c plots a standardised asymmetric t with skewness and kurtosis as the one in (a) [i.e. with β = −1.354 and ν = 18.718, see Mencía and Sentana (2012) for details] dgp 4: Bivariate Student t with 6 degrees of freedom. dgp 5: Bivariate asymmetric t with skewness vector β = −5 2 and degrees of freedom parameter ν = 16 [see Mencía and Sentana (2012) for details]. dgp 6: Bivariate mixture of two zero-mean normal vectors with covariance matrices which we denote by DM N L L (κ 1 , κ 2 , λ) [see Lanne and Lütkepohl (2010) for details]. Specifically, we set κ 1 = 0.1, κ 2 = 0.2 and λ = 1/5.
The left panels of Fig. 2 display the joint densities for these distributions, while their contours are presented in the right panels.
To gauge the finite sample size and power of our proposed independence tests, we generate 20, 000 samples for each of the designs under the null and 5000 for those under the alternative. Additionally, we evaluate the small sample size and power of the normality tests presented in Sect. 4.1 using the results from the simulation designs dgp 1 and 1d (null), and dgp 2 and dgp 3 (alternative).

Bootstrap procedures
The theoretical results in Beran (1988) imply that if the usual Gaussian asymptotic approximation provides a reliable guide to the finite sample distribution of the sample version of the moments being tested, the bootstrapped critical values should not only be valid, but also their errors should be of a lower order of magnitude under additional regularity conditions that guarantee the validity of a higher-order Edgeworth expansion. 13 For that reason, we also analyse the performance of applying the bootstrap to the testing procedures we have described in Sects. 4.1 and 4.2.
In the case of our tests for independence, for each Monte Carlo sample, we can easily generate another N boot bootstrap samples of size T that impose the null with probability approaching 1 as T increases as follows. 14 First, we generate N T draws R is from a discrete uniform distribution between 1 and T , which we then use to construct y t −τ T are the estimated residuals in any given sample.
As for the normality tests, whose null hypothesis is that a single shock ε * it is Gaussian, we adopt a partially parametric resampling scheme in which the draws of the i th shockε * is are independently simulated from a N (0, 1) distribution, while the draws for the remaining shocksε * ks (k = i) are obtained nonparametrically as in the previous paragraph.
Although these bootstrap procedures are simple and fast for any given sample, they quickly become prohibitively expensive in a Monte Carlo exercise as T increases. 13 Therefore, if the true shocks had unbounded variance, the bootstrap would not work, but neither would the asymptotic approximation. 14 To see this, notice that under the null, while under the alternative, where the second term in the right-hand side accounts for the probability of sampling contemporaneous residuals in a sample of size T . Clearly, the second expression converges to the first one as T goes to infinity.
For this reason, for the designs with T = 1000 we rely on the warp-speed method of Giacomini et al. (2013). Table 1 reports Monte Carlo rejection rates of the normality tests proposed in Sect. 4.1 for dgp 1, 1d, 2 and 3. As can be seen, the null of normality is correctly rejected a large number of times when it does not hold, even in samples of length 250. The only possible exception is the skewness component of the Jarque-Bera test when applied to the symmetric Student t shock in dgp 3. Given that the population third moment is zero in this case, the only source of power is the fact that the sample variability of H 3 is larger for this shock than its theoretical value under Gaussianity. On the other hand, the first three rows of the panels dgp 1 and 1d, which are the ones with a Gaussian shock, show that the normality tests tend to be oversized at the usual nominal levels, especially for samples of length 250. 15 For that reason, we generate N Boot = 399 bootstrap samples at each Monte Carlo replication, as described in Sect. 5.1.3. Table 2 shows that the standard bootstrap version of our tests is pretty accurate for both the third and fourth moment tests. Unlike what we observed in Table 1, though, the size-adjusted power is slightly lower for dgp 1d than for dgp 1.

Testing normality
However, as mentioned at the end of Sect. 4.1, researches may only get a reliable guide to the validity of Assumption 1 by looking at the normality tests for all the individual shocks, the objective being to get at least N − 1 rejections. To shed some light on this issue, in Table 3 we report contingency tables which fully characterise the extent to which simultaneous rejections of the individual normality tests occur. As can be seen, our proposed normality tests tend to be rather informative when used in this way.

Testing independence
In Tables 4 (T = 250) and 5 (T = 1000) we report the Monte Carlo rejection rates of the tests we have proposed in Sect. 4.2 under the null of independence. Specifically, we look at the second, third and fourth moment individual tests in m cv [ε * t (θ )], m cs [ε * t (θ )] and m ck [ε * t (θ )], and also at the joint tests for the two co-skewness moments, the three co-kurtosis moments and the combined six moments, including the correlation between the shocks. The left panels of those tables report rejection rates using asymptotic critical values, while the right panels show the bootstrap-based ones for T = 250 and the warp-speed bootstrap-based ones for T = 1000. 16 15 Given 20,000 Monte Carlo replications, the 95% asymptotic confidence intervals for the Monte Carlo rejection probabilities under the null are (.86,1.14), (4.70,5.30) and (9.58,10.42) at the 1, 5 and 10% levels, respectively. 16 All our i.i.d. designs are such that the individual moment tests converge in distribution to a χ 2 1 random variable, and the joint ones to χ 2 2 , χ 2 3 and χ 2 6 variables, respectively. We can see in Table 4 some small to moderate finite sample size distortion when T = 250, although in several cases they are corrected by the bootstrap. The only exceptions seem to be dgp 1 and 1d, in which some small distortions remain even   Mencía and Sentana (2012) for details]. We describe the sampling procedure we use to implement both the standard bootstrap and Giacomini et al. (2013)'s warp-speed bootstrap in Sect. 5.1.3 and ε * 2t ∼ t(10) [see Mencía and Sentana (2012) for details]. We present the asymptotic distribution of the test statistics in Sect. 5.2.2 and describe the sampling procedure we use to implement the bootstrap in Sect. 5.1.3 and ε * 2t ∼ t(10) [see Mencía and Sentana (2012) for details]. We present the asymptotic distribution of the test statistics in Sect. 5.2.2 and describe the sampling procedure we use to implement Giacomini et al. (2013)'s warp-speed bootstrap in Sect. 5.1.3 Monte Carlo empirical rejection rates of independence tests; 5000 replications. Details on the data generating processes: dgp 4, (ε * 1t , ε * 2t ) ∼ t(0, I 2 , 6); dgp 5, (ε * 1t , ε * 2t ) ∼ At(0, I 2 , −5 2 , 16) [see Mencía and Sentana (2012) for details]; and dgp 6, (ε * 1t , ε * 2t ) ∼ DM N L L (.1, .2, 1/5) (see Sect. 5.1.2 for details). We present the asymptotic distribution of the test statistics in Sect. 5.2.2 and describe the sampling procedure we use to implement Giacomini et al. (2013)'s warp-speed bootstrap in Sect. 5.1.3 Monte Carlo empirical rejection rates of independence tests; 5000 replications. Details on the data generating processes: dgp 4, (ε * 1t , ε * 2t ) ∼ t(0, I 2 , 6); dgp 5, (ε * 1t , ε * 2t ) ∼ At(0, I 2 , −5 2 , 16) [see Mencía and Sentana (2012) for details]; and dgp 6, (ε * 1t , ε * 2t ) ∼ DM N L L (.1, .2, 1/5) (see Sect. 5.1.2 for details). We present the asymptotic distribution of the test statistics in Sect. 5.2.2 and describe the sampling procedure we use to implement Giacomini et al. (2013)'s warp-speed bootstrap in Sect. 5.1.3 with this procedure. Given that in these designs there is only one non-Gaussian shock, a plausible explanation is that the identification of C may be weaker, a conjecture we will revisit in the next section. For the other DGPs, the results in Table 4 clearly show that the usual bootstrap version of the tests, which is the relevant one in empirical applications, has much better size properties.
As can be seen in Table 5, finite sample sizes improve considerably for T = 1000. Indeed, the bootstrap versions of the tests seem unnecessary for this sample size because the empirical rejection rates based on asymptotic critical values become generally very close to the nominal ones, though the warp-speed version performs comparably well.
Next, we assess the power of the independence tests for T = 250 and T = 1000 in Tables 6 and 7, respectively. In this respect, we find that the power of our tests against dgp 4 is disappointingly low. A possible explanation is that when the true joint distribution is a symmetric Student t, the dependence between the components is mostly visible in the tails of the distribution. On the other hand, power is mostly coming from the co-skewness component (20) in the case of the joint asymmetric t. Still, the test based on the covariance of shocks (19) is also very powerful. Finally, the co-kurtosis test based on (22) is the most powerful single moment test under the Lanne and Lütkepohl (2010) alternative in dgp 6, with the joint tests that include this moment inheriting its power. Nevertheless, the test based on second moment (19) also has non-negligible power for this design.
In summary, although the rejection rates naturally depend on the type of departure from the null and the specific influence function used for testing, the joint test that considers all moments at once seems to be a winner regardless of the sample size. Table 8 reports summary statistics for the Monte Carlo distribution of the PMLEs of the structural parameters. The first thing we would like to highlight is when one of the shocks is Gaussian, the sampling variability and the finite sample bias are noticeably larger than when both shocks are non-Gaussian but independent, which is in line with the conjecture we expressed in the previous section. Still, even in that case the biases are usually small and often negligible. In addition, the Monte Carlo standard deviations of the estimators in Panel B are roughly half those in Panel A, as one would expect.

Structural parameters estimates
The situation is completely different when the true shocks are cross-sectionally dependent. Failure of condition 2 in Assumption 1 results into significant biases, mostly in the off-diagonal terms of the impact multiplier matrix. In fact, the Monte Carlo variance of these estimators seems to increase with the sample size. In this respect, it is important to remember that the elements of the C matrix are no longer point identified when the joint distribution of the true shocks is either a symmetric or asymmetric Student t. This is confirmed by the fact that the bias of the estimators is lower for dgp 6, in which the rotations of the shocks are not observationally equivalent [see Lanne and Lütkepohl (2010)].   [see Mencía and Sentana (2012) for details]; dgp 4, Mencía and Sentana (2012) for details]; and dgp 6,

Conclusions and directions for further research
Given that the parametric identification of the structural shocks and their impact coefficients C in the Svar model (2) critically hinges on the validity of the identifying restrictions in Assumption 1, it would be desirable that empirical researchers estimating those models reported specification tests that checked those assumptions to increase the empirical credibility of their findings. For that reason, in this paper we propose simple specification tests for independent component analysis and structural vector autoregressions with non-Gaussian shocks that check the normality of a single shock and the potential cross-sectional dependence among several of them. Our tests compare the integer (product) moments of the shocks in the sample with their population counterparts. Importantly, we explicitly consider the sampling variability resulting from using shocks computed with consistent parameter estimators. We study the finite sample size of our tests in several simulation exercises and discuss some bootstrap procedures. We also show that our tests have non-negligible power against a variety of empirically plausible alternatives. As we mentioned in introduction, there are many estimators for the parameters of the static Ica model (1) in addition to the discrete mixture of normals-based PMLEs we have considered in this paper. For example, even within the same likelihood framework, Fiorentini and Sentana (2020) discuss two other consistent estimators of the conditional mean and variance parameters of the Svar in (2) Although the specifications tests that we have proposed in this paper could also be applied to shocks computed on the basis of these alternative estimators, the asymptotic covariance matrices that take into account their sampling variability will differ from the ones we have derived in this paper. Given that some researchers may prefer to use one of those two-step estimation methods, obtaining computationally simple expressions for the adjusted covariance matrix would provide a valuable addition to our results.
In fact, the moment conditions that we consider for testing independence could form the basis of a GMM estimation procedure for the model parameters θ along the lines of Lanne and Luoto (2021), although with a larger set of third and fourth cross-moments. The overidentification restrictions tests obtained as a by-product of this procedure could be used as a specification test of the assumed independence-like restrictions.
Our tests for normality tackle a single shock at a time. Although we could in principle simultaneously test the normality of two or more shocks by combining the corresponding normality tests, the implicit joint null hypothesis would violate the second identification condition in Assumption 1. The asymptotic distribution of such joint tests constitutes a very interesting topic for further research. In addition, we could formally study the limiting probability of finding N − 1 rejections of the univariate normality tests in those circumstances.
Another important research topic would be the limiting behaviour of the PMLEs of θ when Assumption 1 does not hold, either because two or more of the shocks are Gaussian or because they are not independent.
Finally, while the integer product moment tests for independence that we have considered are very intuitive, they may have little power against alternatives in which the dependence is mostly visible in certain regions of the domain of the random shocks. With this in mind, in Amengual et al. (2021b) we study moment tests that look at the product of nonlinear transformations of the shocks, such as I (q αi ≤ ε it ≤ q ωi ), where q αi and q ωi are the α and ω quantiles of the marginal distribution of the i th shock (with 0 ≤ α < ω ≤ 1), or I (k li ≤ ε it ≤ k ui ), where k li < k ui are some fixed values, or indeed ε it I (k li ≤ ε it ≤ k ui ). Extending this approach in such a way that it leads to a consistent test of independence constitutes another promising research avenue.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Proposition 1
Under standard regularity conditions [see, e.g. Newey and McFadden (1994)], we can linearise the vector of influence functions underlying our tests around θ 0 so that If we combine these expressions with the fact that we obtain the desired results.

Proposition 3
Expression (17) follows directly from the definition of the scores for θ and in (5) and (6) and the law of iterated expectations after exploiting the fact that m h [ε * t (θ 0 )], e lt (φ ∞ ), e lt (φ ∞ ) and e rt (φ ∞ ) are i.i.d. processes with zero mean under our assumptions.
In turn, the more detailed expressions exploit the cross-sectional independence of the shocks. For example, consider Given that E(ε * t | ) = 0 and E(ε * 2 t | ) = 1 by construction, straightforward algebra shows that the skewness and kurtosis coefficients will be given by

B.2.2 Scores with respect to " and %
Regarding the specific elements that appear in (9) and (10), we have where we have defined the posterior probabilities of shock i being drawn from component k at time t as w kit = φ[ε * it (θ); μ * k ( i ), σ * 2 k ( i )]/ f [ε * it (θ); i ] to shorten the expressions [see Boldea and Magnus (2009)].
As for the derivatives with respect to the shape parameters in (11), we have and The second derivatives of the log-density with respect to the shape parameters can be derived using the chain rule for second derivatives from the expressions in Boldea and Magnus (2009), who obtain them in terms of λ, μ * k ( i ) and σ * 2 k ( i ) (k = 1, 2). The precise expressions are available on request.

C Monte Carlo results for a trivariate static model
In this appendix, we report finite sample results for a trivariate extension of our benchmark dgp 1, which we denote by dgp 1t . Specifically, we generate samples of size As for the shocks, we choose ε * 1t ∼ N (0, 1), ε * 2t ∼ DM N (−.859, .386, 1/5) and ε * 2t ∼ DM N (.859, .386, 1/5), so that ε * 2t and ε * 3t follow discrete mixtures of two normals with kurtosis coefficients 4 and skewness coefficients equal to −.5 and .5, respectively. Monte Carlo empirical rejection rates of normality tests; 20,000 replications. dgp 1t-Shocks: ε * 1t normal, and ε * 2t and ε * 2t discrete mixture of two normals. See Appendix C for details on the data generating process. Asymptotic critical values: H 3 (·) ∼ χ 2 1 , H 4 (·) ∼ χ 2 1 and H 3 (·) & H 4 (·) ∼ χ 2 2 . We describe the sampling procedure we use to implement the bootstrap in Sect. 5.1.3  Table 9 reports Monte Carlo rejection rates of the normality tests proposed in Sect. 4.1 for samples of size T = 250 (top panel) and T = 1000 (bottom panel). The first three columns of those panels report rejection rates using asymptotic critical values, while the last three columns show the bootstrap-based ones for T = 250 and the warp-speed bootstrap-based ones for T = 1000. Once again, the normality tests tend to be oversized at the usual nominal levels, especially for samples of length 250, while the standard bootstrap version of our tests is much more reliable for both the third and fourth moment tests. More importantly, the null of normality is correctly rejected a large number of times when it does not hold, even in samples of length 250. Nevertheless, there is a moderate loss of power relative to Table 2, which may reflect the need to estimate almost twice as many parameters as in the bivariate case. In larger dimensions, one might expect a similar pattern, although in general, the main determinants of the power of our normality test will be the non-normality of the structural shock under consideration and how precisely identified it is.
Finally, in Table 10 we report the Monte Carlo rejection rates of the tests we have proposed in Sect. 4.2 under the null of independence for samples of size T = 250 (left panel) and T = 1000 (right panel). As in Table 9, the first (last) three columns of those panels report rejection rates using asymptotic (bootstrapped) critical values. As in the bivariate case (cf. Table 4), we can see some small to moderate finite sample size distortion when T = 250, although in almost all cases they are corrected by the bootstrap. Finite sample sizes improve considerably for samples of length 1000, as expected.