Skip to main content
Log in

Modified Schwarz and Hannan–Quinn information criteria for weak VARMA models

  • Published:
Statistical Inference for Stochastic Processes Aims and scope Submit manuscript

Abstract

Numerous multivariate time series admit weak vector autoregressive moving-average (VARMA) representations, in which the errors are uncorrelated but not necessarily independent nor martingale differences. These models are called weak VARMA by opposition to the standard VARMA models, also called strong VARMA models, in which the error terms are supposed to be independent and identically distributed (iid). This article considers the problem of order selection of the weak VARMA models by using the information criteria. It is shown that the use of the standard information criteria are often not justified when the iid assumption on the noise is relaxed. As a consequence, we propose the modified versions of the Schwarz or Bayesian information criterion and of the Hannan and Quinn criterion for identifying the orders of weak VARMA models. Monte Carlo experiments show that the proposed modified criteria estimate the model orders more accurately than the standard ones. An illustrative application using the squared daily returns of financial series is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. To cite few examples of nonlinear multivariate processes, let us mention the self-exciting threshold vector autoregressive [see Tsay (1998)], the smooth transition vector autoregressive (Camacho 2004), the random coefficient VARMA (Alj et al. 2014).

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csáki F (eds) 2nd international symposium on information theory. Akadémia Kiado, Budapest, pp 267–281

  • Alj A, Jnasson K, Mlard G (2014) The exact Gaussian likelihood estimation of time-dependent VARMA models. Comput Stat Data Anal. doi:10.1016/j.csda.2014.07.006

  • Bauwens L, Laurent S, Rombouts JVK (2006) Multivariate GARCH models: a survey. J Appl Econ 21:79–109

    Article  MathSciNet  Google Scholar 

  • Boubacar Maïnassara Y (2011) Multivariate portmanteau test for structural VARMA models with uncorrelated but non-independent error terms. J Stat Plan Inferenc 141:2961–2975

    Article  MathSciNet  MATH  Google Scholar 

  • Boubacar Maïnassara Y (2012) Selection of weak VARMA models by modified Akaike’s information criteria. J Time Ser Anal 33:121–130

    Article  MATH  Google Scholar 

  • Boubacar Maïnassara Y, Carbon M, Francq C (2012) Computing and estimating information matrices of weak ARMA models. Comput Stat Data Anal 56:345–361

    Article  MathSciNet  MATH  Google Scholar 

  • Boubacar Maïnassara Y, Francq C (2011) Estimating structural VARMA models with uncorrelated but non-independent error terms. J Multivar Anal 102:496–505

    Article  MathSciNet  MATH  Google Scholar 

  • Brockwell PJ, Davis RA (1991) Time series: theory and methods. Springer, New York

    Book  MATH  Google Scholar 

  • Brüggemann R, Lütkepohl H (2001) Lag selection in subset VAR models with an application to a U.S. monetary system. In: Friedmann R, Knüppel L, Lütkepohl H (eds) Econometric studies–a festschrift in honour of Joachim Frohn Münster. LIT, Berlin, pp 107–128

    Google Scholar 

  • Camacho M (2004) Vector smooth transition regression models for US GDP and the composite index of leading indicators. J Forecast 23:173–196

    Article  Google Scholar 

  • Dufour J-M, Pelletier D (2005) Practical methods for modelling weak VARMA processes: identification, estimation and specification with a macroeconomic application. Technical report, Département de sciences économiques and CIREQ, Université de Montréal, Montréal, Canada

  • Francq C, Zakoïan J-M (2005) Recent results for linear time series models with non independent innovations. In: Duchesne P, Rémillard B (eds) Statistical modeling and analysis for complex data problems, Chap. 12. Springer, New York, pp 241–265

    Chapter  Google Scholar 

  • Francq C, Zakoïan J-M (2010) GARCH models: structure, statistical inference and financial applications. Wiley, Chichester

    Book  Google Scholar 

  • Hannan EJ, Quinn BG (1979) The determination of the order of an autoregression. J R Stat Soc B 41:190–195

    MathSciNet  MATH  Google Scholar 

  • Hannan EJ, Rissanen J (1982) Recursive estimation of mixed of autoregressive moving average order. Biometrika 69:81–94

    Article  MathSciNet  MATH  Google Scholar 

  • Hurvich CM, Tsai C-L (1989) Regression and time series model selection in small samples. Biometrika 76:297–307

    Article  MathSciNet  MATH  Google Scholar 

  • Hurvich CM, Tsai C-L (1993) A corrected Akaike information criterion for vector autoregressive model selection. J Time Ser Anal 14:271–279

    Article  MathSciNet  MATH  Google Scholar 

  • Jeantheau T (1998) Strong consistency of estimators for multivariate ARCH models. Econom Theory 14:70–86

    Article  MathSciNet  Google Scholar 

  • Katayama N (2012) Chi-squared portmanteau tests for structural VARMA models with uncorrelated errors. J Time Ser Anal 33:863–872

  • Lütkepohl H (2005) New introduction to multiple time series analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  • Neath AA, Cavanaugh JE (1997) Regression and time series model selection using variants of the Schwarz information criterion. Communications in statistics-theory and methods 26:559–580

    Article  MathSciNet  MATH  Google Scholar 

  • R Development Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org

  • Raftery AE (1995) Bayesian model selection in social research. Sociol Methodol 25:111–163

    Article  Google Scholar 

  • Reinsel GC (1997) Elements of multivariate time series analysis, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Tsay RS (1998) Testing and modeling multivariate threshold models. J Am Stat Assoc 93:1188–1202

    Article  MathSciNet  MATH  Google Scholar 

  • van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

Download references

Acknowledgments

The research of the first author is supported by a BQR (Bonus Qualité Recherche) of the Université de Franche-Comté. We sincerely thank the associated editor and the anonymous reviewers for helpful remarks. Their detailed comments led to greatly improve the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yacouba Boubacar Maïnassara.

Appendix

Appendix

Proof of Proposition 1

Since the selected model with \(k_1^{\prime }\) parameters is overfitted, we have \(s_0\) overfitted parameters whose true values are zeros.

To test \(s_0\) linear constraints on the elements of \(\theta _0\) , we thus consider a null hypothesis of the form

$$\begin{aligned} H_0:R_0\theta _0={\mathfrak {r}}_0 \end{aligned}$$

where \(R_0\) is a known \(s_0\times k_0\) matrix of rank \(s_0\) and \({\mathfrak {r}}_0\) is a known \(s_0\)-dimensional vector. The Wald, Lagrange multiplier (LM) and likelihood ratio (LR) principles are employed frequently for testing \(H_0\). For instance, consider the LM test. Let \(\hat{\theta }_n^c\) be the restricted QMLE of the parameter under \(H_0\). Define the Lagrangean

$$\begin{aligned} {\mathscr {L}}(\theta ,\lambda )=\tilde{\ell }_n(\theta )-\lambda ^{\prime }(R_0\theta -{\mathfrak {r}}_0), \end{aligned}$$

where \(\lambda \) denotes a \(s_0\)-dimensional vector of Lagrange multipliers. The first-order conditions yield

$$\begin{aligned} \frac{\partial \tilde{\ell }_n}{\partial \theta }(\hat{\theta }_n^c)=R_0^{\prime }\hat{\lambda },\quad R_0\hat{\theta }_n^c={\mathfrak {r}}_0. \end{aligned}$$

It will be convenient to write \(a\mathop {=}\limits ^{c}b\) to signify \(a=b+c\). A Taylor expansion gives under \(H_0\)

$$\begin{aligned} 0= & {} \sqrt{n}\frac{\partial \tilde{\ell }_n(\hat{\theta }_n)}{\partial \theta }\mathop {=}\limits ^{o_P(1)}\sqrt{n}\frac{\partial \tilde{\ell }_n(\hat{\theta }_n^c)}{\partial \theta }-J\sqrt{n}\left( \hat{\theta }_n-\hat{\theta }_n^c\right) . \end{aligned}$$

We deduce that

$$\begin{aligned} \sqrt{n}(R_0\hat{\theta }_n-{\mathfrak {r}}_0)= & {} R_0\sqrt{n}(\hat{\theta }_n-\hat{\theta }_n^c)\mathop {=}\limits ^{o_P(1)}R_0J^{-1}\sqrt{n}\frac{\partial \tilde{\ell }_n(\hat{\theta }_n^c)}{\partial \theta } = R_0J^{-1}R_0^{\prime }\sqrt{n}\hat{\lambda }. \end{aligned}$$

Thus under \(H_0\) and the previous assumptions,

$$\begin{aligned} \sqrt{n}\hat{\lambda }\mathop {\rightarrow }\limits ^{{\mathscr {L}}}{\mathscr {N}}\left\{ 0,(R_0J^{-1}R_0^{\prime })^{-1}R_0\varOmega R_0^{\prime }(R_0J^{-1}R_0^{\prime })^{-1}\right\} . \end{aligned}$$
(20)

Standard Taylor expansions show that

$$\begin{aligned} \sqrt{n}(\hat{\theta }_n-\hat{\theta }_n^c)\mathop {=}\limits ^{o_P(1)}-\sqrt{n}J^{-1}R_0^{\prime }\hat{\lambda }, \end{aligned}$$

and that the LR statistic satisfies

Using the previous computations and standard results on quadratic forms of normal vectors [see e.g. Lemma 17.1 in van der Vaart (1998)], we find that the \(\mathrm {\mathbf {LR}}_n\) statistic is asymptotically distributed as \(\sum _{i=1}^{s_0}\mathrm {\lambda }_i Z_i^2\) where the \(Z_i^{\prime }\)s are iid \({\mathscr {N}}(0,1)\) and \(\mathrm {\lambda }_1,\dots , \mathrm {\lambda }_{s_0}\) are the eigenvalues of

$$\begin{aligned} \varSigma _{\mathrm {\mathbf {LR}}}=J^{-1/2}S_{\mathrm {\mathbf {LR}}}J^{-1/2},\quad S_{\mathrm {\mathbf {LR}}}=\frac{1}{2}R_0^{\prime }(R_0J^{-1}R_0^{\prime })^{-1}R_0\varOmega R_0^{\prime }(R_0J^{-1}R_0^{\prime })^{-1}R_0. \end{aligned}$$

When \(p^{\prime }\ge p_0\) and \(q^{\prime }\ge q_0\) with either \(p^{\prime }\) or \(q^{\prime }\) greater then their true value, even though the model might not be identified in this case. So eventually

$$\begin{aligned} \hbox {BIC}_c(k_1)-\hbox {BIC}_c(k^{\prime }_1)= & {} 2\left\{ \log \tilde{\mathrm {L}}_n(\hat{\theta }_{n,k^{\prime }_1})-\log \tilde{\mathrm {L}}_n(\hat{\theta }_{n,k_1})\right\} -(c_{k^{\prime }_1}-c_{k_1})\log (n)\\ {}= & {} \mathrm {\mathbf {LR}}_n- (c_{k^{\prime }_1}-c_{k_1})\log (n), \end{aligned}$$

where \(\hat{\theta }_{n,k}\) is the QMLE of the true k-dimensional parameter \(\theta _{0,k}\). Combining these results, the asymptotic probability that the \(\hbox {BIC}_c\) criterion selects the overfitted model is

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {P}}\left\{ \hbox {BIC}_c(k_1)\ge \hbox {BIC}_c(k^{\prime }_1)\right\}= & {} \lim _{n\rightarrow \infty }{\mathbb {P}}\left\{ \sum _{i=1}^{s_0}\mathrm {\lambda }_i Z_i^2\ge (c_{k^{\prime }_1}-c_{k_1})\log (n) \right\} =0. \end{aligned}$$

The proof is complete. \(\square \)

Proof of Proposition 2

It is similar to that given for Proposition 1 and we omit it. \(\square \)

Proof of Proposition 3

Let \(\hat{\theta }_{n,k_1}\) the QMLE of the true parameter \(\theta _{0,k_1}\) with dimensional \(k_1\). The QMLE satisfies \(\log {\mathrm {L}}_n(\hat{\theta }_{n,k_1})\ge \log {\mathrm {L}}_n(\theta _{0,k_1})\) almost surely.

The \(\hbox {BIC}_c\) criterion underfits if \(\hbox {BIC}_c(k_1^{\prime \prime })\le \hbox {BIC}_c(k_1)\). In this case, since the selected model with \(k_1^{\prime \prime }\) parameters is misspecified, as n grows to infinity, eventually

$$\begin{aligned} \log \tilde{\mathrm {L}}_n(\hat{\theta }_{n,k^{\prime \prime }_1})> \log \tilde{\mathrm {L}}_n(\hat{\theta }_{n,k_1}). \end{aligned}$$
(21)

Thus, the asymptotic probability that the \(\hbox {BIC}_c\) criterion selects the underfitted model is

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {P}}\left\{ \hbox {BIC}_c(k^{\prime \prime }_1)\le \hbox {BIC}_c(k_1)\right\}= & {} \lim _{n\rightarrow \infty }{\mathbb {P}}\left\{ -2\log \tilde{\mathrm {L}}_n(\hat{\theta }_{n,k^{\prime \prime }_1})+c_{k^{\prime \prime }_1}\log (n)\right. \\\le & {} \left. -2 \log \tilde{\mathrm {L}}_n(\hat{\theta }_{n,k_1})+c_{k_1}\log (n)\right\} \\= & {} \lim _{n\rightarrow \infty }{\mathbb {P}}\left\{ -2\left\{ \log \tilde{\mathrm {L}}_n(\hat{\theta }_{n,k^{\prime \prime }_1})- \log \tilde{\mathrm {L}}_n(\hat{\theta }_{n,k_1}) \right\} \right. \\\le & {} \left. (c_{k_1}-c_{k^{\prime \prime }_1})\log (n)\right\} . \end{aligned}$$

From (21), the fact that \(\log (n)/n\rightarrow 0\) and \(\mathrm {\mathbf {LR}}_n/n\) tends to a strictly positive constant, the conclusion follows. \(\square \)

Proof of Proposition 4

It is similar to that given for Proposition 3 and we omit it. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maïnassara, Y.B., Kokonendji, C.C. Modified Schwarz and Hannan–Quinn information criteria for weak VARMA models. Stat Inference Stoch Process 19, 199–217 (2016). https://doi.org/10.1007/s11203-015-9123-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11203-015-9123-z

Keywords

Mathematics Subject Classification

Navigation