Skip to main content

Abstract

In this chapter we analyze asymptotic properties of the simulated out-of-sample predictive mean squared error (PMSE) criterion based on a rolling window when selecting among nested forecasting models. When the window size is a fixed fraction of the sample size, Inoue and Kilian (J Econ 130: 273–306, 2006) show that the PMSE criterion is inconsistent. We consider alternative schemes under which the rolling PMSE criterion is consistent. When the window size diverges slower than the sample size at a suitable rate, we show that the rolling PMSE criterion selects the correct model with probability approaching one when parameters are constant or when they are time varying. We provide Monte Carlo evidence and illustrate the usefulness of the proposed methods in forecasting inflation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The out-of-sample PMSE criteria are based on simulated out-of-sample predictions where parameters are estimated from a subsample to predict an observation outside the subsample. When subsamples always start with the first observation and use consecutive observations whose number is increasing, we call the simulated quadratic loss the recursive PMSE criterion. When subsamples are based on the same number of observations and are moving, we call the simulated quadratic loss the rolling PMSE criterion and the number of observations in the subsamples is the window size. See Inoue and Kilian (2006) for more technical definitions of these criteria.

  2. 2.

    When \(T=100\), \(W=T^{1/3}\) is too small to compute a rolling estimator.

  3. 3.

    Technically, the window size \(W=T^{2/3}\) does not satisfy our sufficient condition but yields good results.

References

  • Andrews, D.W.K. (1993), “Tests for Parameter Instability and Structural Change with Unknown Change Point”, Econometrica 61, 821–856.

    Article  Google Scholar 

  • D’Agostino, A., D. Giannone, and P. Surico (2006), “(Un)Predictability and Macroeconomic Stability” ECB Working Paper 605.

    Google Scholar 

  • Cai, Z., (2007), “Trending Time-Varying Coefficient Time Series Models with Serially Correlated Errors”, Journal of Econometrics, 136, 163–188.

    Article  Google Scholar 

  • Clark, T.E., and M.W. McCracken (2000), “Not-for-Publication Appendix to ” Tests of Equal Forecast Accuracy and Encompassing for Nested Models, unpublished manuscript, Federal Reserve Bank of Kansas City and Louisiana State Unviersity.

    Google Scholar 

  • Clark, T.E., and M.W. McCracken (2001) “Tests of Equal Forecast Accuracy and Encompassing for Nested Models”, Journal of Econometrics, 105, 85–110.

    Article  Google Scholar 

  • Clark, T.E., and M.W. McCracken (2005), “Evaluating Direct Multistep Forecasts”, Econometric Reviews, 24, 369–404.

    Article  Google Scholar 

  • Gallant, A.R., and H. White (1988), A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models, Basil Blackwell: New York, NY.

    Google Scholar 

  • Giacomini, R. and B. Rossi (2010), “Forecast Comparisons in Unstable Environments”, Journal of Applied Econometrics 25(4), 595–620.

    Article  Google Scholar 

  • Giacomini, R. and H. White (2006), “Tests of Conditional Predictive Ability”, Econometrica 74(6), 1545–1578.

    Article  Google Scholar 

  • Giraitis, L., G. Kapetanios and T. Yates (2011),“Inference on Stochastic Time-Varying Coefficient Models”, unpublished manuscript, Queen Mary, University of London, and the Bank of England.

    Google Scholar 

  • Hall, P., and C.C. Heyde (1980), Martingale Limit Theory and its Application, Academic Press: San Diego CA.

    Google Scholar 

  • Inoue, A., and L. Kilian (2006), “On the Selection of Forecasting Models”, Journal of Econometrics, 130, 273–306.

    Article  Google Scholar 

  • Marcellino, Massimiliano, James H. Stock and Mark W. Watson (2003), “Macroeconoinic Forecasting in the Euro Area: Country-Speci...c vs. Area-Wide Information”, European Economic Review, 47(1), pages 1–18.

    Article  Google Scholar 

  • McConnell, M.M., and G. Perez-Quiros (2000), “Output Fluctuations in the United States: What Has Changed Since the Early 1980” American Economic Review, 90(5), 1464–1476.

    Article  Google Scholar 

  • Meese, R. and K.S. Rogoff (1983a), “Exchange Rate Models of the Seventies. Do They Fit Out of Sample?”, The Journal of International Economics 14, 3–24.

    Article  Google Scholar 

  • Meese, R. and K.S. Rogoff (1983b), “The Out of Sample Failure of Empirical Exchange Rate Models”, in Jacob Frankel (ed.), Exchange Rates and International Macroeconomics, Chicago: University of Chicago Press for NBER.

    Google Scholar 

  • Rossi, B., and A. Inoue (2011), “Out-of-Sample Forecast Tests Robust to the Choice of Window Size”, mimeo.

    Google Scholar 

  • Rossi, B., and T. Sekhposyan (2010), “Have Models™ Forecasting Performance Changed Over Time, and When?”, International Journal of Forecasting, 26(4).

    Google Scholar 

  • Sin, C.-Y., and H. White (1996), “Information Criteria for Selecting Possibly Misspecified Parametric Models”, Journal of Econometrics, 71, 207–225.

    Article  Google Scholar 

  • Stock, James H. and Mark W. Watson (1999a), “Business Cycle Fluctuations in U.S. Macroeconomic Time Series”, in Handbook of Macroeconomics, Vol. 1, J.B. Taylor and M. Woodford, eds, Elsevier, 3–64.

    Google Scholar 

  • Stock, James H. and Mark W. Watson (1999b), “Forecasting In ation”, Journal of Monetary Economics, 44, 293–335.

    Google Scholar 

  • Swanson, N.R., and H. White (1997), “A Model Selection Approach to Real-Time Macroeconomic Forecasting Using Linear Models and Artificial Neural Networks”, Review of Economics and Statistics.

    Google Scholar 

  • Wei, C.Z. (1992), “On Predictive Least Squares Principles”, Annals of Statistics, 20, 1–42.

    Article  Google Scholar 

  • West, K.D. (1996), “Asymptotic Inference about Predictive Ability”, Econometrica 64, 1067–1084.

    Article  Google Scholar 

  • Wooldridge, J.M., and H. White (1988), Some Invariance Principles and Central Limit Theorems for Dependent Heterogeneous Processes, Econometric Theory 4, 210–230.

    Article  Google Scholar 

  • Wooldridge, J.M. (1994), “Estimation and Inference for Dependent Processes”, in the Handbook of Econometrics, Volume IV, Edited by R.F. Engle and D.L. McFadden, Chapter 45, pp. 2639–2738.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Atsushi Inoue .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A.1 Lemmas

Next, we present a lemma similar to Lemma A2 of Clark and McCracken (2000).

Lemma 1

Suppose that Assumptions 1 and 2 hold and that \(\gamma =0\). Then:

  1. (a)

    \(\frac{1}{T-h-W} \sum _{t=W+1}^{T-h}u_{t+h}x_{t}B_{1}(t)H_{1}(t) =o_{p}\left(\frac{1}{W} \right) \).

  2. (b)

    \(\frac{1}{T-h-W} \sum _{t=W+1}^{T-h}v_{t+h}z_{t}B_{2}(t)H_{2}(t) =o_{p}\left(\frac{1}{W} \right) \).

  3. (c)

    \(\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_1^{\prime }(t)B_1(t) x_tx_t^{\prime }B_1(t)H_1(t)=\frac{1}{T-h-W}\sum \limits _{t=W+h}^{T-1}H_1^{\prime }(t)B_1H_1(t)+o_p\left( \frac{1}{W}\right)\).

  4. (d)

    \(\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2(t) z_tz_t^{\prime }B_2(t)H_2(t) = \frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2H_2(t)+o_{p}\left( \frac{1}{W}\right)\).

Proof of Lemma 1: The proofs for (a) and (c) are very similar to those for (b) and (d), respectively. For brevity, we only provide the proofs of (b) and (d). The results for (a) and (c) can be easily derived by replacing \(z_{t}\) and \( \beta \) by \(x_{t}\) and \(\alpha \), respectively.

Note that

$$\begin{aligned} \frac{1}{T-h-W}\sum _{t=W+1}^{T-h}v_{t+h}z_{t}B_{2}(t)H_{2}(t)&= \frac{1}{T-h-W}\sum _{t=W+1}^{T-h}v_{t+h}z_{t}B_{2}H_{2}(t) \\&\quad +\frac{1}{T-h-W}\sum _{t=W+1}^{T-h}v_{t+h}z_{t}(B_{2}(t)-B_{2})H_{2}(t) \end{aligned}$$

By Assumption 2(b) and Hölder’s inequality, the second moments of the summands on the right-hand side are of order \(O(W^{-1})\) and \(O(W^{-2})\), respectively. Thus, it follows from Assumption 2(c) that the variance of the left-hand side is of order \(O(T^{-1}W^{-1})\). By the Chebyshev inequality and Assumption 1, the left-hand side is \(o_{p}(W^{-1})\).

The proof of (d) is composed of two stages. In the first stage, we show that \(B_2(t)\) in the equation can be approximated by its expectation \(B_2\), which is

$$\begin{aligned}&\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2(t) z_tz_t^{\prime }B_{2}(t)H_2(t) \nonumber \\&\quad =\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_{2} z_tz_t^{\prime }B_{2}H_2(t)+o_p\left(\frac{1}{W}\right) \end{aligned}$$
(20)

Since the left-hand side of Eq. (A.1) contains four terms,

$$\begin{aligned}&\frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2(t) z_tz_t^{\prime }B_2(t)H_2(t) \nonumber \\&\quad \qquad = \frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2z_tz_t^{\prime }B_2H_2(t) \nonumber \nonumber \\&\quad \qquad \quad \quad + \frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)(B_2(t)-B_2)z_tz_t^{\prime }(B_2(t)-B_2)H_2(t) \nonumber \nonumber \\&\quad \qquad \quad \quad + \frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2z_tz_t^{\prime }(B_2(t)-B_2)H_2(t) \nonumber \nonumber \\&\quad \qquad \quad \quad + \frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)(B_2(t)-B_2)z_tz_t^{\prime }B_2H_2(t), \end{aligned}$$
(21)

which include the first term in the right-hand side of Eq.  (A.1).

By Assumption 2(b) and Hölder’s inequality, the second moments of the summands in the last three terms are of order \( O(W^{-4})\), \(O(W^{-3})\), and \(O(W^{-3})\), respectively. Thus, their first moments are at most \(O(W^{-3})=o(W^{-1})\). By using these and Assumption 2(e), the second moments of the last three terms are thus of the order \( O(T^{-1}W^{-4})\), \(O(T^{-1}W^{-3})\) and \(O(T^{-1}W^{-1})\), respectively. By the Chebyshev inequality and Assumption 1, these last three terms are of the order \(o_{p}(W^{-1})\), proving (A.1).

The second stage of the proof of (d) is to show that we can further approximate \(z_tz_t^{\prime }\) in the first term in the right-hand side of Eq. (A.2) by its expectation \( E(z_tz_t^{\prime }) \). Adding and subtracting \(E(z_t z_t^{\prime })\), we obtain

$$\begin{aligned}&\frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2z_tz_t^{\prime }B_2H_2(t)\nonumber \\&\qquad =\frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2E(z_tz_t^{\prime })B_2H_2(t) \nonumber \nonumber \\&\qquad \quad \quad + \frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2(z_tz_t^{\prime }-E(z_tz_t^{\prime }))B_2 H_2(t) \end{aligned}$$
(22)

The mean of the second term is \(o_{p}(W^{-1})\) by Assumption 2(d). The second moments of the summand in the second term is \(O(W^{-2})\) by Assumption 2(b). Using these and Assumption 2(e), the second moment of the second term is of the order \(o(W^{-2})\). By the Chebyshev inequality, () is \(o_{p}(W^{-1})\).

Lemma 2.

Suppose that Assumptions 3 and 4 hold and that \(\gamma (\cdot )=0\).

  1. (a)

    \(\frac{1}{T-h-W} \sum _{t=W+1}^{T-h}u_{T,t+h}x_{T,t}B_{1}(t)H_{1}(t) =o_{p}\left(\frac{1}{W} \right)\).

  2. (b)

    \(\frac{1}{T-h-W} \sum _{t=W+1}^{T-h}v_{T,t+h}z_{T,t}B_{2}(t)H_{2}(t) =o_{p}\left(\frac{1}{W} \right)\).

  3. (c)

    \(\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_1^{\prime }(t)B_1(t) x_{T,t}x_{T,t}^{\prime }B_1(t)H_1(t) = \frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_1^{\prime }(t)\bar{ B}_1\left(\frac{t}{T}\right)H_1(t)+o_p\left(\frac{1}{W}\right)\).

  4. (d)

    \(\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)B_2(t) z_{T,t}z_{T,t}^{\prime }B_2(t)H_2(t) = \frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_2^{\prime }(t)\bar{B}_2\left(\frac{t}{T}\right)H_2(t)+o_{p}\left(\frac{1}{W}\right)\).

Proof of Lemma 2 Under Assumptions 3 and 4 the proof of Lemma 2 takes exactly the same steps as the proof of Lemma 1 except that \(B_{i}\), \(u_{t}\), and \(v_{t}\) are replaced by \(\bar{B}_{i}\left(\frac{t}{T}\right)\), \(u_{T,t}\), and \(v_{T,t}\), respectively. This is because Lemma 2 is written in terms of \(u_{T,t}\) and \( v_{T,t}\) rather than in terms of \(\hat{\alpha }_{t,W}-\alpha \left(\frac{t}{T} \right)\) and \(\hat{\beta }_{t,W}-\beta \left(\frac{t}{T}\right)\) which we deal with in the proof of Theorem 2.

1.2 A.2 Proofs of Theorems

Proof of Theorem 1 Note that the PMSEs \(\hat{\sigma }_{1,W}^{2}\) and \(\hat{\sigma }_{2,W}^{2}\) can be expanded as

$$\begin{aligned} \hat{\sigma }_{1,W}^{2}&=\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}{\left( { y_{t+h}-\widehat{\alpha }_{t}^{\prime }x_{t}}\right) }^{2} \nonumber \\&=\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}{\left( y_{t+h}-{\alpha ^{*}} ^{\prime }x_{t}-\left( \widehat{\alpha }_{t}^{\prime }x_{t}-{\alpha ^{*}}^{\prime }x_{t}\right) \right) ^{2}} \nonumber \\&=\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}{\left( y_{t+h}-{\alpha ^{*}} ^{\prime }x_{t}\right) ^{2}} \nonumber \\&\qquad -\frac{2}{T-h-W}\sum \limits _{t=W+1}^{T-h}{ \left( y_{t+h}-{\alpha ^{*}}^{\prime }x_{t}\right) x_{t}^{\prime }\left( \widehat{\alpha }_{t}-{\alpha ^{*}}\right) } \nonumber \\&\qquad +\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}{\left( \widehat{\alpha } _{t}^{\prime }-{\alpha ^{*}}^{\prime }\right) x_{t}x_{t}^{\prime }\left( \widehat{\alpha }_{t}-{\alpha ^{*}}\right) } \end{aligned}$$
(23)

and

$$\begin{aligned} \hat{\sigma }_{2,W}^{2}&=\frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h}{\left( { y_{t+h}-\widehat{\beta }_{t}^{\prime }z_{t}}\right) }^{2} \nonumber \\&=\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}{\left( {y_{t+h}-\beta ^{\prime }z_{t}-\left( {\widehat{\beta }_{t}^{\prime }z_{t}-\beta ^{\prime }z_{t}} \right) }\right) }^{2} \nonumber \\&=\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}{\left( {y_{t+h}-\beta ^{\prime }z_{t}}\right) }^{2} \nonumber \\&\qquad -\frac{2}{T-h-W} \sum \limits _{t=W+1}^{T-h}{\left( { y_{t+h}-\beta ^{\prime }z_{t}}\right) }z_{t}^{\prime }\left( {\widehat{\beta }_{t}-\beta }\right) \nonumber \\&\qquad +\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}{\left( {\widehat{\beta } _{t}^{\prime }-\beta ^{\prime }}\right) }z_{t}z_{t}^{\prime }\left( { \widehat{\beta }_{t}-\beta }\right) , \end{aligned}$$
(24)

respectively, where \(\alpha ^{*}=[E(x_{t}x_{t}^{\prime })]^{-1}E(x_{t}y_{t+h})\). There are two cases: the case in which the data are generated from model 1, i.e., \(\gamma =0\) (case 1) and the case in which the data are generated from model 2, i.e., \(\gamma \ne 0\) (case 2).

In case 1, the actual model is \(y_{t+h}=\alpha ^{\prime }x_{t}+v_{t+h}\). The first component of \(\hat{\sigma }_{2,W}^{2}\) in Eq.  (A.5) is numerically identical to the first component of \(\hat{\sigma }_{1,W}^{2}\) in Eq.  (A.4) because \(\gamma =0\) and \(\alpha -\alpha ^{*}=0\). Note that all the other components converge to zero faster since all parameters are consistently estimated. Under the case where Model 1 is true, the difference between the probability limit of \(\hat{\sigma }_{1,W}^{2}\) and \(\hat{\sigma }_{2,W}^{2}\) is zero, which does not identify which model is the true model. Only comparing the probability limits of \(\hat{\sigma }_{1,W}^{2}\) and \(\hat{\sigma }_{2,W}^{2}\) as \(T\) and \(W \) go to infinity and \(W\) diverges slowly than \(T\) is not sufficient for the model selection to indicate that \(\lim _{T\rightarrow \infty ,\ W\rightarrow \infty }P(\hat{\sigma }_{1,W}^{2}<\hat{\sigma } _{2,W}^{2})=1\). However, if we can tell whether \(\hat{\sigma }_{1,W}^{2}\) is always smaller than \(\hat{\sigma }_{2,W}^{2}\) along the path of convergence of \(T\) and \(W\) toward infinity, the true model can still be identified. Since the models are nested \(u_{t+h}=v_{t+h}\), it follows from () and (A.5) that

$$\begin{aligned} \hat{\sigma }_{2,W}^{2}-\hat{\sigma }_{1,W}^{2}&=\frac{2}{T-h-W} \sum _{t=W+1}^{T-h}\left[ v_{t+h}z_{t}^{\prime }(\widehat{\beta }_{t}-\beta )-v_{t+h}x_{t}^{\prime }(\widehat{\alpha }_{t}-\alpha )\right] \nonumber \\&\quad +\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h} \left[ {\left( {\widehat{\beta } _{t}^{\prime }-\beta ^{\prime }}\right) }z_{t}z_{t}^{\prime }\left( { \widehat{\beta }_{t}-\beta }\right) \right. \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \left. -{\left( {\widehat{\alpha }_{t}^{\prime }-{\alpha }^{\prime }}\right) }x_{t}x_{t}^{\prime }\left( {\widehat{\alpha } _{t}-{\alpha }}\right) \right] \nonumber \\&=\frac{2}{T-h-W}\sum _{t=W+1}^{T-h}\left[ v_{t+h}z_{t}^{\prime }B_{2}(t)H_{2}(t)-v_{t+h}x_{t}^{\prime }B_{1}(t)H_{1}(t)\right] \nonumber \\&\quad +\frac{1}{T-h-W}\sum _{t=W+1}^{T-h}\left[ H_{2}(t)^{\prime }B_{2}(t)z_{t}z_{t}^{\prime }B_{2}(t)H_{2}(t)\right. \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \left.-H_{1}(t)^{\prime }B_{1}(t)x_{t}x_{t}^{\prime }B_{1}(t)H_{1}(t)\right] \nonumber \\&=\frac{1}{T-h-W}\sum _{t=W+1}^{T-h}\left[ H_{2}(t)^{\prime }B_{2}H_{2}(t)-H_{1}(t)^{\prime }B_{1}H_{1}(t)\right] +o_{p}\left( \frac{1}{W }\right) \end{aligned}$$
(25)

where the last equality follows from Lemma 1(a)–(d).

To get the sign of Eq. (A.6), we first define \(Q\) by

$$\begin{aligned} Q\;=\;[E(z_{t}z_{t}^{\prime })]^{\frac{1}{2}}\left\{ [E(z_{t}z_{t}^{\prime })]^{-1}-\left[ \begin{array}{cc} [E(x_{t}x_{t}^{\prime })]^{-1}&\mathbf 0 _{l\times (k-l)} \\ \mathbf 0 _{(k-l)\times l}&\mathbf 0 _{(k-l)\times (k-l)} \end{array} \right] \right\} [E(z_{t}z_{t}^{\prime })]^{\frac{1}{2}} \end{aligned}$$
(26)

as in Lemma A.4 of Clark and McCracken (2000). Clark and McCracken (2000) show that the \(Q\) matrix is symmetric and idempotent. An idempotent matrix is positive semidefinite, which means for all \(v\in \mathfrak R ^{k}\), \(v^{T}Qv\ge 0 \). It implies that

$$\begin{aligned}&\quad \left[ \frac{1}{W_{h}^{\frac{1}{2}}}\sum \limits _{s=t-W}^{t-h}z_{s}v_{s+h} \right] ^{\prime }[E(z_{t}z_{t}^{\prime })]^{-1}\left[ \frac{1}{W_{h}^{\frac{ 1}{2}}}\sum \limits _{s=t-W}^{t-h}z_{s}v_{s+h}\right] \nonumber \\&\quad -\left[ \frac{1}{W_{h}^{\frac{1}{2}}}\sum \limits _{s=t-W}^{t-h}x_{s}v_{s+h} \right] ^{\prime }[E(x_{t}x_{t}^{\prime })]^{-1}\left[ \frac{1}{W_{h}^{\frac{ 1}{2}}}\sum \limits _{s=t-W}^{t-h}x_{s}v_{s+h}\right] \nonumber \\&=\left[ \frac{1}{W_{h}^{\frac{1}{2}}}\sum \limits _{s=t-W}^{t-h}z_{s}v_{s+h} \right] ^{\prime }\left\{ [E(z_{t}z_{t}^{\prime })]^{-1}-\left[ \begin{array}{cc} [E(x_{t}x_{t}^{\prime })]^{-1}&\mathbf 0 _{l\times (k-l)} \\ \mathbf 0 _{(k-l)\times l}&\mathbf 0 _{(k-l)\times (k-l)} \end{array} \right] \right\} \nonumber \\&\qquad \times \left[ \frac{1}{W_{h}^{\frac{1}{2}}}\sum \limits _{s=t-W}^{t-h}z_{s}v_{s+h}\right] \nonumber \\&=\left[ \frac{1}{W_{h}^{\frac{1}{2}}}\sum \limits _{s=t-W}^{t-h}z_{s}v_{s+h} \right] ^{\prime }[E(z_{t}z_{t}^{\prime })]^{-\frac{1}{2}}\cdot Q\cdot [E(z_{t}z_{t}^{\prime })]^{-\frac{1}{2}} \nonumber \\&\qquad \times \left[ \frac{1}{W_{h}^{\frac{1}{2}}}\sum \limits _{s=t-W}^{t-h}z_{s}v_{s+h}\right]\;\;\ge \;\;0 \end{aligned}$$
(27)

Note that the probability that \([E(z_{t}z_{t}^{\prime })]^{-1/2}W_{h}^{-1/2}\sum _{s=t-W}^{t-h}z_{s}v_{s+h}\) lies in the null space of \(Q\) for infinitely many \(t\) approaches zero because the dimension of the null space is \(l<k\). Thus, the average of (A.8) over \(t\) is positive with probability approaching one. Combining the results in Eqs.  (A.6) and (A.8), we find that the sign of \(W(\hat{\sigma }_{2,W}^{2}-\hat{\sigma }_{1,W}^{2})\) is always positive with probability approaching one. Therefore, when \(\gamma =0\), \(\hat{\sigma } _{1,W}^{2}<\hat{\sigma }_{2,W}^{2}\) with probability approaching one.

In case 2, that is, when Model 2 is the true model, we have \(y_{t+h}\) \(=\beta ^{\prime }z_{t}+v_{t+h}\) \(=\alpha ^{\prime }x_{t}+\gamma ^{\prime }w_{t}+v_{t+h}\). By Assumptions 2(a)(b), the second and third terms on the right-hand sides of () and (A.5) are both \(o_{p}(T^{1/2}/W)\) and \( o_{p}(T/W^{2})\), respectively. Thus, they are \(o_{p}(1)\) by Assumption 1. The first term on the right-hand side of Eq. (A.5) converges to the variance of \(v_{t+h}\) as the sample size \(T\) goes to infinity:

$$\begin{aligned} \frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}{\left( {y_{t+h}-\beta ^{\prime }z_{t}}\right) }^{2}=\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}{v_{t+h}^{2}} \overset{p}{\rightarrow }\sigma _{2}^{2}. \end{aligned}$$
(28)

Similarly, the first term on the right-hand side of Eq. () converges in probability to the variance of \(u_{t+h}\equiv y_{t+h}-\alpha ^{*\prime }x_{t}\):

$$\begin{aligned} \hat{\sigma }_{1,W}^{2}&=\frac{1}{T-h-W}\sum \limits _{t=W+h}^{T-1}{\left( y_{t+h}-{\alpha ^{*}}^{\prime }x_{t}\right) ^{2}}+o_{p}(1) \nonumber \\&\overset{p}{\rightarrow } E\left[ (y_{t+h}-{\alpha ^{*}}^{\prime }x_{t})^{2}\right] \nonumber \\&=E\left[ (\alpha ^{\prime }x_{t}+\gamma ^{\prime }w_{t}+v_{t+h}-{\alpha ^{*}}^{\prime }x_{t})^{2}\right] \nonumber \\&=E\left[ (v_{t+h}+(\alpha ^{\prime }-{\alpha ^{*}}^{\prime })x_{t}+\gamma ^{\prime }w_{t})^{2}\right] \nonumber \\&=\sigma _{2}^{2}+\left[ \begin{array}{c} \alpha -\alpha ^{*} \\ \gamma \end{array} \right] ^{\prime }\left[ \begin{array}{cc} E(x_{t}x_{t}^{\prime })&E(x_{t}w_{t}^{\prime }) \\ E(w_{t}x_{t}^{\prime })&E(w_{t}w_{t}^{\prime }) \end{array} \right] \left[ \begin{array}{c} \alpha -\alpha ^{*} \\ \gamma \end{array} \right]\;\;>\;\;\sigma _{2}^{2}. \end{aligned}$$
(29)

Therefore, when Model 2 is true, the PMSEs satisfy \(P(\hat{\sigma }_{1,W}^{2}> \hat{\sigma }_{2,W}^{2})=1\) as \(T\rightarrow \infty \) and \(W\rightarrow \infty \), where \(W\) diverges slower than \(T\).

Proof of Theorem 2 Note that the PMSEs, \(\hat{\sigma }_{1,W}^{2}\) and \(\hat{\sigma }_{2,W}^{2}\) can be expanded as

$$\begin{aligned} \hat{\sigma }_{1,W}^{2}&=\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}\left( y_{T,t+h}-\alpha ^{*}\left(\frac{t}{T}\right) ^{\prime }x_{T,t}\right)^{2} \nonumber \\&\quad -\frac{2}{T-h-W}\sum \limits _{t=W+1}^{T-h}\left( y_{T,t+h}-\alpha ^{*}\left(\frac{t}{T}\right)^{\prime }x_{T,t}\right) x_{T,t}^{\prime } \left( \widehat{\alpha }_{t}-\alpha ^{*}\left(\frac{t}{T}\right)\right)\nonumber \\&\quad +\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}\left(\widehat{\alpha } _{t}^{\prime }-\alpha ^{*}\left(\frac{t}{T}\right)\right)^{\prime } x_{T,t}x_{T,t}^{\prime }\left( \widehat{\alpha }_{t}-\alpha ^{*}\left( \frac{t}{T}\right)\right) \end{aligned}$$
(30)

and

$$\begin{aligned} \hat{\sigma }_{2,W}^{2}&= \frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}\left( y_{T,t+h}-\beta ^{\prime }\left(\frac{t}{T}\right)z_{T,t}\right)^{2} \nonumber \\&\quad -\left( \frac{2}{T-h-W}\right) \sum \limits _{t=W+1}^{T-h}\left( y_{T,t+h}-\beta ^{\prime }\left(\frac{t}{T}\right)z_{T,t}\right) z_{T,t}^{\prime }\left( \widehat{\beta }_{t}-\beta \left(\frac{t}{T}\right) \right) \nonumber \\&\quad +\frac{1}{T-h-W} \sum \limits _{t=W+1}^{T-h} \left(\widehat{\beta } _{t}^{\prime }-\beta ^{\prime }\left(\frac{t}{T}\right)\right)^{\prime } z_{t}z_{t}^{\prime }\left( \widehat{\beta }_{t}-\beta \left(\frac{t}{T} \right) \right) , \end{aligned}$$
(31)

respectively. If we show that each of

$$\begin{aligned}&\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}\left( y_{T,t+h}-\alpha ^{*}\left(\frac{t}{T}\right)^{\prime }x_{T,t}\right) x_{t}^{\prime }\left( \widehat{\alpha }_{t}-{\alpha ^{*}\left(\frac{t}{T}\right)}\right)\nonumber \\&\quad - \frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}u_{T,t+h}x_{T,t}^{\prime } B_{1}(t)H_{1}(t), \end{aligned}$$
(32)
$$\begin{aligned}&\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}\left( \widehat{\alpha } _{t}^{\prime }-\alpha ^{*}\left(\frac{t}{T}\right)\right)^{ \prime }x_{T,t}x_{T,t}^{\prime }\left( \widehat{\alpha }_{t}-\alpha ^{*}\left(\frac{t}{T}\right)\right) \nonumber \\&\quad - \frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_{1}(t)^{\prime }B_{1}(t)z_{T,t}z_{T,t}^{\prime }B_{2}(t)H_{2}(t), \end{aligned}$$
(33)
$$\begin{aligned}&\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}\left(y_{T,t+h}-\beta \left( \frac{t}{T}\right)^{\prime }z_{T,t}\right)z_{T,t}^{\prime }\left( \widehat{ \beta }_{t}-\beta \left(\frac{t}{T}\right)\right) \nonumber \\&\quad - \frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}v_{T,t+h}z_{T,t}^{\prime } B_{2}(t)H_{2}(t), \end{aligned}$$
(34)
$$\begin{aligned}&\frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}\left( \widehat{\beta } _{t}^{\prime }-\beta \left(\frac{t}{T}\right)^{\prime }\right)^{\prime } z_{T,t}z_{T,t}^{\prime }\left( \widehat{\beta }_{t}-\beta \left(\frac{t}{T} \right)\right) \nonumber \\&\quad - \frac{1}{T-h-W}\sum \limits _{t=W+1}^{T-h}H_{2}(t)^{\prime }B_{2}(t)z_{T,t}z_{T,t}^{\prime }B_{2}(t)H_{2}(t), \end{aligned}$$
(35)

are \(o_{p}(1/W)\) when the data are generated from model 1 (case 1) and are \( o_{p}(1)\) when the data are generated from model 2 (case 2), the proof of Theorem 2 takes exactly the same steps as the proof of Theorem 1. Thus, it remains to show that (A.13)–(A.16) are \( o_{p}(W^{-1})\) in case 1 and \(o_{p}(1)\) in case 2. Note that the bias of the rolling regression estimator can be written as:

$$\begin{aligned} \hat{\beta }_{W,t}-\beta \left(\frac{t}{T}\right)&= B_{2}(t)\frac{1}{W_{h}} \sum _{s=t-W}^{t-h}z_{s} \left[v_{s+h}+z_{s}^{\prime }\left(\beta \left(\frac{s }{T}\right)-\beta \left(\frac{t}{T}\right)\right) \right] \nonumber \\&= B_{2}(t)H_{2}(t)+\frac{B_{2}(t)}{W_{h}}\sum _{s=t-W}^{t-h}z_{s}z_{s}^{ \prime }\left(\beta \left(\frac{s}{T}\right)-\beta \left(\frac{t}{T} \right)\right) \end{aligned}$$
(36)

Thus, the difference (A.15) is

$$\begin{aligned}&\frac{1}{T-h-W}\sum _{t=W+1}^{T-h}v_{T,t+h}z_{T,t}^{\prime }B_{2}(t) \frac{1}{W_{h}}\sum _{s=t-W}^{t-h}z_{s}z_{s}^{\prime }\left(\beta \left(\frac{s }{T}\right)-\beta \left(\frac{t}{T}\right)\right). \nonumber \\&= \frac{1}{T-h-W}\sum _{t=W+1}^{T-h}v_{T,t+h}z_{T,t}^{\prime }\bar{B} _{2}\left(\frac{t}{T}\right) \frac{1}{W_{h}}\sum _{s=t-W}^{t-h}z_{s}z_{s}^{ \prime }\left(\beta \left(\frac{s}{T}\right)-\beta \left(\frac{t}{T} \right)\right) \nonumber \\&\quad +\frac{1}{T-h-W}\sum _{t=W+1}^{T-h}v_{T,t+h}z_{T,t}^{\prime }\left(B_{2}(t)-\bar{B}_{2}\left(\frac{t}{T}\right)\right) \frac{1}{W_{h}} \nonumber \\&\quad \times \sum _{s=t-W}^{t-h}z_{s}z_{s}^{\prime } \left(\beta \left(\frac{s}{T} \right)-\beta \left(\frac{t}{T}\right)\right). \end{aligned}$$
(37)

By Assumption 4(c), the summands have zero mean. By Hölder’s inequality and Assumptions 4(b)(c)(e)(f), the second moments of the right-hand side terms are \(O(W/T^{2})\). By Chebyshev’s inequality, (A.15) is \( O_{p}(W^{1/2}/T)\) which is \(o_{p}(1/W)\) by Assumption 3. It can be shown that (A.13) is also \(o_{p}(1/W)\) in a similar fashion.

The difference (A.16) is the sum of the following three terms:

$$\begin{aligned} \frac{1}{T-h-W}\sum _{t=W+1}^{T-h} v_{T,t+h}z_{T,t}z_{T,t}^{\prime }B_{2}(t)\frac{1}{W_{h}}\sum _{s=t-W}^{t-h}z_{s}z_{s}^{\prime }\left(\beta \left(\frac{s}{T}\right)-\beta \left(\frac{t}{T}\right)\right), \end{aligned}$$
(38)
$$\begin{aligned} \frac{1}{T-h-W}\sum _{t=W+1}^{T-h} \left(\beta \left(\frac{s}{T} \right)-\beta \left(\frac{t}{T}\right)\right)^{\prime }\frac{1}{W_{h}} \sum _{s=t-W}^{t-h}z_{s}z_{s}^{\prime }B_{2}(t)z_{T,t}z_{T,t}^{\prime }v_{T,t+h}, \end{aligned}$$
(39)
$$\begin{aligned}&\frac{1}{T-h-W}\sum _{t=W+1}^{T-h} \left(\beta \left(\frac{s}{T} \right)-\beta \left(\frac{t}{T}\right)\right)^{\prime }\frac{1}{W_{h}} \sum _{s=t-W}^{t-h}z_{s}z_{s}^{\prime }B_{2}(t)z_{T,t} \nonumber \\&\quad \times z_{T,t}^{\prime }B_{2}(t)\frac{1}{W_{h}} \sum _{s=t-W}^{t-h}z_{s}z_{s}^{\prime }\left(\beta \left(\frac{s}{T} \right)-\beta \left(\frac{t}{T}\right)\right), \end{aligned}$$
(40)

Using Chebyshev’s inequality, Hölder’s inequality, Assumptions 3 and 4(b)(c)(e)(f), it can be shown that (A.19), (A.20), and (A.21) are \(O_{p}(W^{1/2}T^{-2})\), \( O_{p}(W^{1/2}T^{-2})\) and \(O_{p}(W^{2}T^{-2})\) all of which are \( o_{p}(W^{-1})\). It can be shown that (A.14) is also \( o_{p}(1/W) \) when \(\gamma (\cdot )=0\) in an analogous fashion.

The rest of the proof of Theorem 2 takes exactly the same steps as the proof of Theorem 1 except that \(\alpha ^{*}\), \(\beta \), \( B_{i}\), \(u_{t}\), \(v_{t}\), \(x_{t}\), \(y_{t}\), \(z_{t}\) and Lemma 1 is replaced by \(\alpha \left(\frac{t}{T}\right)\), \(\beta \left(\frac{t}{T}\right)\), \(\bar{B }_{i}\left(\frac{t}{T}\right)\), \(u_{Tt}\), \(v_{Tt}\), \(x_{Tt}\), \(y_{Tt}\), \( z_{Tt}\) and Lemma 2, respectively.

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Inoue, A., Rossi, B., Jin, L. (2013). Consistent Model Selection: Over Rolling Windows. In: Chen, X., Swanson, N. (eds) Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1653-1_12

Download citation

Publish with us

Policies and ethics