1 Introduction

In recent decades, dynamic factor models (DFMs) have been widely used to represent comovements within large systems of macroeconomic and financial variables, in which the cross-sectional dimension is often relatively large compared with the time dimension; see Stock and Watson (2017) for the importance of DFMs in time series econometrics. DFMs generally assume the existence of a small number of unobserved factors capturing the comovements in the system.Footnote 1 Two main types of procedures for factor extraction are popular in the related literature. First, in many applications, factors are extracted using nonparametric procedures based on principal components (PC), which are attractive because they are computational simple and have well-known theoretical properties. In particular, PC is consistent under mild conditions and, as far as the factors are pervasive and the idiosyncratic dependence is weak, it is robust to the underlying dependence of common factors and idiosyncratic components. As a consequence, PC procedures are very popular for factor estimation and several excellent surveys are available in the literature; see, among others, Bai and Ng (2008a) for a technical survey on the econometric theory for PC. However, when the common factors and/or idiosyncratic components are serially dependent, PC procedures do not use this information and, consequently, they are not efficient.

Alternatively, after casting the DFM as a state-space model (SSM), factors can be extracted using Kalman filter and smoothing (KFS) procedures. One important feature of these procedures is that they open the door to Maximum Likelihood (ML) estimation of the model parameters. Furthermore, KFS is also very flexible, allowing to handle in a straightforward way data characteristics often observed in practice as, for example, missing data, mixed frequencies, seasonal dependencies, nonstationarity or regime-switching nonlinearity. Moreover, KFS procedures are also of interest in empirical applications because they allow incorporating restrictions on the factor loadings, as in multi-level DFMs, or on the idiosyncratic components, and to perform counterfactual exercises; see, for example, Banbura et al. (2011) for multi-level models and Luciani (2015) for counter-factual analysis. However, the main drawback of KFS is that it requires full specification of the dependence of the common and idiosyncratic components, opening the door to potential misspecification; see Poncela et al. (2021) for a very recent survey on KFS for factor extraction in DFMs.

There are few papers looking at the effects of the misspecification of the factors on factor extraction and forecasting and all of them focus on factors extracted using PC. Boivin and Ng (2006) conclude that overestimating the number of factors affects the precision with which they are estimated and the forecasting results, while Barigozzi and Cho (2020a) also conclude that overestimating the number of factors could yield non-negligible estimation errors. In an empirical application forecasting GDP growth for Germany and France, Barhoumi et al. (2013) show that not necessarily more factors imply better forecasting. Gonçalves et al. (2017) analyzing the same data set considered in this paper, conclude that the forecasting ability depends on the specific combination of eight PC factors used in factor-augmented predictive regression. Finally, Breitung and Eickmeier (2011a) conclude that, if the cross-sectional dimension is large, the dynamic properties of the factors are not important for factor extraction and forecasting. This paper contributes to the literature by analyzing the empirical consequences on factor estimation and in-sample predictability and out-of-sample forecasting of extracting factors using not only PC but also KFS under various sources of potential misspecification. In particular, we consider factor extraction and forecasting when assuming different number of factors and different factor dynamics. The analysis is carried out extracting factors from the ubiquitous data base of US macroeconomic variables described by McCracken and Ng (2016) and forecasting some key US macroeconomic magnitudes. Factor extraction procedures have been previously being compared using this data set; see, for example, Poncela and Ruiz (2015) and the references therein. However, as far as we know, the empirical properties of KFS extraction under potential sources of misspecification have not been analyzed before when extracting factors from the same data set; see, Aruoba et al. (2009) for the importance of comparing factor extraction procedures in the context of the same data set. In the particular US macroeconomic data set analyzed in this paper, we show that specifications with more factors and more lags are favored in-sample when looking both at log-likelihood ratio tests and at measures of fit of factor-augmented predictive regressions. However, increasing the number of factors and/or their lag structure does not always lead to an increase in the out-of-sample forecast precision with the out-of-sample mean square forecast errors (MSFEs) being generally minimized when forecasts are based on simple models with one factor extracted using KFS and modelled as an AR(1) process. It is important to note that these results might not be applicable beyond the macroeconomic system considered in this paper. Whether they can be generally applicable is an interesting issue that is beyond our objectives. Careful Monte Carlo experiments could be designed to analyze their general applicability.

The rest of the paper is organized as follows. Section 2 briefly describes the representation of DFMs as SSMs and how factor extraction can be performed by PC and KFS. In Sect. 3, the factors are extracted from a system of US macroeconomic variables under the assumption of serially uncorrelated idiosyncratic components. We analyze the differences in terms of point and interval estimation of factors, in sample prediction and out-of-sample forecasting, when factors are extracted using PC and the KFS under different assumptions on the number of factors and their dynamic dependence. Section 4 concludes the paper.

2 Dynamic factor models and factor extraction

For completeness, in this section, we briefly describe the DFM as well as factor extraction based on PC and on KFS.

2.1 The dynamic factor model

The following stationary approximate DFM has been extensively analyzed in the related literatureFootnote 2

$$\begin{aligned} y_{i t}=\lambda _i^{\prime }F_{t}+\varepsilon _{i t}, \end{aligned}$$
(1)

where \(y_{i t}\) is the observation of the ith variable, for \(i=1,\ldots ,N\), at time t, for \(t=1,\ldots ,T\), \(\lambda _i=(\lambda _{i1},\ldots ,\lambda _{ir})^{\prime }\) is the \(r\times 1\) vector of unknown factor loadings corresponding to \(y_{i t}\), \(F_{t}=(f_{1t},\ldots ,f_{rt})^{\prime }\) is the \(r\times 1\) vector of unobservable stochastic factors at time t and \(\varepsilon _{i t}\) is the idiosyncratic component of \(y_{i t}\). The number of factors, r, is assumed to be known and fixed as the cross-sectional and temporal dimensions, N and T, respectively, grow. If the idiosyncratic components, are assumed to be cross-sectionally uncorrelated, i.e., their covariance matrix, \(\Sigma _{\varepsilon }\) is diagonal, the DFM is known as “exact” while, if the idiosyncratic noises are weakly cross-correlated, the DFM is called “approximate”.

The DFM in equation (1) can be also written in matrix form as follows

$$\begin{aligned} Y=F\Lambda ^{\prime }+\varepsilon , \end{aligned}$$
(2)

where Y is the \(T \times N\) matrix of observations, F is the \(T\times r\) matrix of factors, \(\Lambda \) is the \(N \times r\) matrix of factor loadings and \(\varepsilon \) is the \(T\times N\) matrix of idiosyncratic components. Finally, it is also useful to express the DFM in the following vector form

$$\begin{aligned} Y_{t}=\Lambda F_{t}+\varepsilon _{t}, \end{aligned}$$
(3)

where \(Y_t=(y_{1t},\ldots ,y_{\mathrm{Nt}})^{\prime }\) and \(\varepsilon _t=(\varepsilon _{1t},\ldots ,\varepsilon _{\mathrm{Nt}})^{\prime }\) are \(N\times 1\) vectors.

2.2 Principal components factor extraction

PC, which is among the most popular factor extraction procedures due to its simplicity and low computational burden together with its good theoretical properties, estimates \(\Lambda \) and F by minimizing the following sum of squares:

$$\begin{aligned} V(r)=(\mathrm{NT})^{-1}\sum _{i=1}^{N}\sum _{t=1}^{T}\left( y_{it}-\lambda _{i}^{'} F_{t}\right) ^{2}. \end{aligned}$$
(4)

The factors cannot be individually identified and we can only estimate the space spanned by them. In the context of PC factor extraction, it is popular to assume that \(\frac{F^{'}F}{T}=I_r\) and \(\Lambda ^{'}\Lambda \) is diagonal with distinct elements in the main diagonal arranged in decreasing order; see Bai and Ng (2013) for a discussion on identification restrictions in DFMs. With respect to the identification of the sign, it could be useful to use external information; see, for example, Geweke and Zhou (1996), who propose determining the sign of a single factor by assuming that its weight in a given variable is positive and Stock and Watson (2016) for an application.

Stock and Watson (2002b) show that, if r is known, the serial and cross-sectional correlations of the idiosyncratic components are weak and the factors are pervasive, the space spanned by the estimated PC factors is consistent when both N and T tend simultaneously to infinity. Later, Bai (2003) shows that, if also \(\frac{\sqrt{N}}{T}\rightarrow 0\), the factors are asymptotically normal. If the idiosyncratic noises are further assumed to be serially uncorrelated, then the limiting distributions are independent across t. The asymptotic approximation of the mean square error (MSE) of the PC factors at time t, \({\tilde{f}}^{\mathrm{PC}}_t\), can be estimated as follows

$$\begin{aligned} \mathrm{MSE}_t=\left( \frac{{\tilde{\Lambda }}^{\mathrm{PC}^{\prime }}{\tilde{\Lambda }}^{\mathrm{PC}}}{N} \right) ^{-1}\frac{{\tilde{\Gamma }}_t}{N} \left( \frac{{\tilde{\Lambda }}^{\mathrm{PC}^{\prime }}{\tilde{\Lambda }}^{\mathrm{PC}}}{N} \right) ^{-1}, \end{aligned}$$
(5)

where \({\tilde{\Lambda }}^{\mathrm{PC}}\) is the PC estimate of the matrix of loadings and, according to Bai and Ng (2006), \({\tilde{\Gamma }}_t\) can be estimated using the following estimator, which is robust to cross-sectional dependence and heteroscedasticity of the idiosyncratic components

$$\begin{aligned} {\widetilde{\Gamma }}_{t}=\frac{1}{n}\sum _{i=1}^{n}\sum _{j=1}^{n}{\tilde{\lambda }}^{\mathrm{PC}}_{i}{\tilde{\lambda }}^{\mathrm{PC}^{{\prime }}} _{j} \frac{1}{T} \sum _{t=1}^{T}{\tilde{\varepsilon }}_{it}{\tilde{\varepsilon }}_{jt}, \end{aligned}$$
(6)

where \(n=\mathrm{min}[\sqrt{N}, \sqrt{T}]\). Note that the robust estimator in (6) is consistent but requires covariance stationarity with \(E(\varepsilon _{it} \varepsilon _{jt})=\sigma _{ij}, \forall t\).Footnote 3

Results on the performance of the asymptotic distribution to approximate the finite sample distribution of the estimated PC factors are scarce; see Ouysse (2006), Poncela and Ruiz (2015) and Maldonado and Ruiz (forthcoming), who show that the uncertainty of PC factors is underestimated when computed using asymptotic results.

2.3 Kalman filter and smoothing factor extraction

Alternatively, factor extraction is often based on KFS. Assume that the vector of common factors, \(F_t\), evolves over time according to the following stationary VAR(p) model

$$\begin{aligned} F_t=\Phi _1 F_{t-1} + \Phi _2 F_{t-2} +\cdots + \Phi _p F_{t-p} + u_t, \end{aligned}$$
(7)

where \(u_t\) is an \(r \times 1\) white noise vector with covariance matrix \(\Sigma _u\). Although the vector of idiosyncratic components may have temporal dependence, we will consider that it is white noise.

When a particular specification is assumed for factors, as in (7), DFMs are particular cases of the much larger class of SSMs, in which observable variables are expressed in terms of unobserved or latent variables, which in turn evolve according to some lagged dynamics. It is straightforward to write the DFM as a SSM and, assuming that r and p as well as all DFM parameters are known, KFS can be implemented to extract the factors, regardless of the cross-sectional dimension, N; see Poncela et al. (2021) for a survey on factor extraction based on KFS and a description on how to express the DFM as a SSM when the idiosyncratic components are serially correlated. However, even if r and p were known, in practice, one needs to estimate the model parameters before KFS algorithms can be run. Assuming that \(\Sigma _{\varepsilon }\) is diagonal, i.e., the idiosyncratic components are serially uncorrelated, and assuming normality, estimation of the parameters can be carried out by ML with the Kalman filter (KF) used to compute the innovation decomposition form of the Gaussian likelihood, which is given by

$$\begin{aligned} \mathrm{log} L(Y;\Psi ) = - \frac{\mathrm{NT}}{2} \mathrm{log} (2 \pi ) - \frac{1}{2} \sum _{t=1}^{T} \mathrm{log} |\Sigma _t| - \frac{1}{2} \sum _{t=1}^{T} \nu _t^{'} \Sigma _t^{-1} \nu _t, \end{aligned}$$
(8)

where the innovation vector, \(\nu _t=Y_t-E(Y_t|Y_1,\ldots ,Y_{t-1})\), and its covariance matrix, \(\Sigma _t\), can be obtained from the KF and \(\Psi \) is the vector of parameters to be estimated, namely the loadings in \(\Lambda \), the variances in the main diagonal of the covariance matrix of the idiosyncratic noises, \(\sigma _{\varepsilon _1}^2,\ldots ,\sigma _{\varepsilon _N}^2\), the autoregressive parameters of the VAR model for the factors in \(\Phi _1\),...,\(\Phi _p\) and the parameters in the covariance matrix \(\Sigma _{u}\). After imposing the necessary identification restrictions, the log-likelihood can be maximized using numerical optimization with, for example, Newton–Raphson algorithms.Footnote 4 Following Harvey (1989), the identifying restrictions on the parameters considered in this paper are \(\lambda _{i,j}=0\) for \(j>i\) and \(i=1,\ldots ,r\) and \(\Sigma _{u}=I_r\).Footnote 5 The resulting estimator is denoted as ML-NO. In this very simple DFM, the main hurdle found in the ML-NO estimator appears when N is extremely large, because the number of parameters to be estimated, \(r^2\times p + N \times (r+1-\frac{r(r-1)}{2})\), increases with N.

Alternatively, given that direct optimization of the log-likelihood in (8) can be infeasible when N is large, the ML estimator of the DFM parameters can be obtained by the iterative expectation maximization (EM) algorithm. To simplify the description of the EM algorithm, let us assume that \(p=1\), i.e., the factors are specified as a VAR(1) model.Footnote 6 First, starting values for the parameters in \({\hat{\Lambda }}^{(0)}, {\hat{\Sigma }}_{\varepsilon }^{(0)}\) and \({\hat{\Phi }}^{(0)}\), are obtained based on factors and loadings estimated by PC. The starting parameters for the loadings are \({\hat{\Lambda }}^{(0)}={\tilde{\Lambda }}^{\mathrm{PC}}\) while the autoregressive parameters are estimated by the following least squares (LS) estimator

$$\begin{aligned} {\hat{\Phi }}^{(0)}=\left( \sum _{t=1}^T {\tilde{f}}^{\mathrm{PC}}_{t-1} {\tilde{f}}_{t-1}^{\mathrm{PC}\prime }\right) ^{-1}\sum _{t=1}^T{\tilde{f}}^{\mathrm{PC}}_t{\tilde{f}}^{\mathrm{PC}\prime }_{t-1}, \end{aligned}$$
(9)

and the covariance matrix of the idiosyncratic components is estimated by

$$\begin{aligned} {\hat{\Sigma }}_{\varepsilon }^{(0)}=\mathrm{diag} \left\{ \frac{1}{T}\sum _{t=1}^T{\tilde{\varepsilon }}_t{\tilde{\varepsilon }}_t^{\prime } \right\} \end{aligned}$$
(10)

where \({\tilde{\varepsilon }}_t=Y_t-{\tilde{\Lambda }}^{\mathrm{PC}} {\tilde{f}}_{t}^{\mathrm{PC}}\).

The expectation step consists in running the KFS algorithm with the parameters of the DFM substituted by the starting values above to obtain \(f^{(0)}_{t|T}\), \(P^{(0)}_{t|T}\) and \(C^{(0)}_t\), where \(f^{(0)}_{t|T}\) and \(P^{(0)}_{t|T}\) are the smoothed estimate of \(F_t\) and its corresponding estimated MSE, given by the Kalman smoother, and \(C^{(0)}_t=E\left[ \left( F_{t}-f^{(0)}_{t|T}\right) \left( F_{t-1}-f_{t-1|T}^{(0)}\right) ^{\prime }\right. \) \(\left. |Y_1,\ldots ,Y_T\right] \) can also be obtained by the Kalman smoother by augmenting the state vector to include \(F_{t-1}\).Footnote 7 In the maximization step, the parameters of the DFM are estimated as follows

$$\begin{aligned} {\hat{\Lambda }}^{(1)}= & {} \sum _{t=1}^TY_tf_{t|T}^{(0)\prime } \left( \sum _{t=1}^T f_{t|T}^{(0)}f_{t|T}^{(0)\prime }+P_{t|T}^{(0)}\right) ^{-1}, \end{aligned}$$
(11)
$$\begin{aligned} {\hat{\Phi }}^{(1)}= & {} \left( \sum _{t=1}^Tf_{t|T}^{(0)}f_{t-1|T}^{(0)\prime }+C_{t}^{(0)} \right) \left( \sum _{t=1}^T f^{(0)}_{t-1|T}f_{t-1|T}^{(0)\prime }+P_{t-1|T}^{(0)}\right) ^{-1}, \end{aligned}$$
(12)

while \(\Sigma _{(\varepsilon )}\) is estimated as in (10) with the PC residuals substituted by \({\hat{\varepsilon }}^{(1)}_t=Y_t-{\hat{\Lambda }}^{(0)}f_{t|T}^{(0)}\).Footnote 8 Recall that, for identification purposes, the parameters of the DFM need to be restricted and, therefore, using the restrictions described above, \(\Sigma _u=I_r\) does not need to be estimated. Furthermore, denoting by \(F^{(S)}\) and \(P^{(S)}\), the \(T \times r\) matrix of smoothed factors and their corresponding MSE matrix in the steady state, respectively, the restrictions in the loadings can be imposed as follows

$$\begin{aligned} vec\left( {\hat{\Lambda }}^{(1)*}\right)= & {} vec\left( {\hat{\Lambda }}^{(1)}\right) \nonumber \\&\quad -\left( R vec\left( {\hat{\Lambda }}^{(1)}\right) -c\right) ^{\prime } \left[ R \left( \left( F^{(S)\prime }F^{(S)} + P^{(S)}\right) ^{-1} \otimes I_N \right) R^{\prime }\right] ^{-1}\nonumber \\&\qquad R \left( \left( F^{(S)\prime }F^{(S)} + P^{(S)}\right) ^{-1} \otimes I_N \right) \end{aligned}$$
(13)

where R is an \(\frac{r(r-1)}{2} \times Nr\) matrix of zeros and ones of the coefficients of the parameters in the restrictions and c is a \(\frac{r(r-1)}{2}\) vector of zeros for the restrictions considered in this case. Consider, for example, that \(r=3\), then the matrix of coefficients of the restrictions is given by the following \(3 \times 3N\) matrix

(14)

The expectation and maximization steps are iterated until convergence and the corresponding estimator is denoted as ML-EM. The parameters of the DFM with serially and cross-sectionally uncorrelated idiosyncratic components can be estimated by ML-EM regardless of N; see, among many others, Stock and Watson (1989, 1991) with \(N=4\), Quah and Sargent (1993) with \(N=60\) and Proietti (2011) with \(N=148\).

If the number of factors, r, and their autoregressive lag, p, are known and under weak cross-correlation of the idiosyncratic components, Doz et al. (2011) show that the smoothed factors extracted using the TS-LS estimates of the parameters are consistent even if \(\Sigma _{\varepsilon }\) is wrongly considered as diagonal when it is not, due to the misspecification error vanishing as N and T diverge to infinity. Later, Doz et al. (2012) extend the result to the ML-EM.Footnote 9 The \(\mathrm{min}\left( \sqrt{N}, T \right) \)- consistency and asymptotic normality of the latter factors have been proved by Barigozzi and Luciani (2020b) who derive the conditions under which the asymptotic distribution can still be used for inference in case of mis-specification. Note that normality of the DFM is not required for the asymptotic normality of the factors. Barigozzi and Luciani (2020b) compare the loadings, factors and common components estimated using PC and QML estimators and conclude that, in static DFMs, both procedures are rather similar.

2.4 Forecasting with DFM

When the number of predictors is large, it is very popular to obtain out-of-sample forecasts of the variables of interest using factor-augmented predictive regressions (also known as diffusion indexes as proposed by Stock and Watson 2002a). The one-step-ahead forecast of the ith variable in the system is given by

$$\begin{aligned} {\hat{y}}_{iT+1|T}=\mu + \sum _{j=1}^{q} \delta _{j} y_{iT-j+1} + \sum _{j=1}^{s} B_j^{\prime } F_{T-j+1} \end{aligned}$$
(15)

where \(B_j=\left( \beta _{1j},\ldots ,\beta _{rj} \right) ^{\prime }\) are parameters. In practice, the parameters of the diffusion index in (15) are estimated by LS after substituting the factors by the corresponding estimates. When the factors are extracted by PC, Stock and Watson (2002a) show that \({\hat{y}}_{iT+1|T}\) is consistent for \(y_{iT+1}\). Bai and Ng (2006) show that, if \(\frac{\sqrt{T}}{N}\rightarrow 0\), the LS estimator of the parameters is \(\sqrt{T}\) consistent and asymptotically normal. Furthermore, they show that the conditional mean predicted by the estimated factors is \(\mathrm{min}[\sqrt{T}, \sqrt{N}]\) consistent and asymptotically normal.Footnote 10 Finally, Bai and Ng (2006) also derive the asymptotic distribution of the forecasts of \(y_{iT+1}\), which can be used to construct forecast intervals.Footnote 11

As far as we are concerned, there are no results available on the asymptotic properties of the parameter estimator and forecasts when the factors extracted using KFS are used in (15). Our conjecture is that, if the convergence rates of PC factors and ML-EM factors are the same, so should be the rate of convergence of the conditional mean predicted by (15). All in all, the theoretical results that are known point out to the same convergence rates of the previous estimators and, therefore, it remains an empirical question the true behavior of the different possibilities when analyzing the data.

The usefulness of the factors can be evaluated out-of-sample by comparing the MSFE of the forecasts obtained from the factor-augmented regression in (15) with the following univariate autoregression for \(y_{it}\) that does not include the factors

$$\begin{aligned} {\hat{y}}^{*}_{iT+1|T}=\mu ^{*} + \sum _{j=1}^{q} \delta ^{*}_{j} y_{iT-j+1}. \end{aligned}$$
(16)

In order to test the out-of-sample predictive ability of the factors, one can use the ENC-F and MSE-F tests as proposed by Gonçalves et al. (2017), who show that the presence of estimated PC factors leads to only minor size distortions of predictive ability tests, although it reduces power relative to the case where factors are observed. The ENC-F and MSE-F tests are given by

$$\begin{aligned} \mathrm{ENC}-F=\frac{\sum _{t=T+1}^{T+H}{\hat{u}}_{1t}\left( {\hat{u}}_{1t}-{\hat{u}}_{2t} \right) }{{\hat{\sigma }}_2^2} \end{aligned}$$
(17)

and

$$\begin{aligned} \mathrm{MSE}-F=\frac{\sum _{t=T+1}^{T+H}\left( {\hat{u}}_{1t}^2-{\hat{u}}_{2t}^2 \right) }{{\hat{\sigma }}_2^2}, \end{aligned}$$
(18)

where H is the number of one-step-ahead forecasts, \({\hat{u}}_{1t}=y_{it}-{\hat{y}}^{*}_{it|t-1}\), with \({\hat{y}}^{*}_{it|t-1}\) being the one-step-ahead forecasts obtained from the autoregression in (16) and \({\hat{u}}_{2t}=y_{it}-{\hat{y}}_{it|t-1}\), with \({\hat{y}}_{it|t-1}\) given by the factor-augmented regression in (15). Finally, \({\hat{\sigma }}_2^2=\frac{1}{H}\sum _{t=T+1}^{T+H}{\hat{u}}_{2t+1}^2\). The asymptotic critical values in Clark and McCracken (2001) and McCracken (2007) can be used to test whether the predictive ability of both models is the same using ENC-F and \(\mathrm{MSE}_F\), respectively.Footnote 12

3 Empirical extraction of factors

The forecasting performance of KFS procedures for factor extraction is analyzed, both in-sample and out-of-sample, in the context of the ubiquitous database described by McCracken and Ng (2016), which consists of \(N=128\) variables observed monthly from January 1983 up to and including December 2020, with a total of 444 observations per series.Footnote 13 Previous to their analysis, the data are transformed to stationarity and outliers and missing observations are dealt with as described by McCracken and Ng (2016). Then, all variables in the system are centered and standardized. The sample period is split into an estimation period from January 1983 to December 2016 (\(T=396\)) and an out-of-sample forecast period, from January 2017 to December 2020 (\(H=48\)). The focus of prediction are the stationary transformations of industrial production (IP), inflation, employment and real income; see, among others, Quah and Sargent (1993), Stock and Watson (2002b), Bai and Ng (2008b), Alvarez et al. (2016), McCracken and Ng (2016), Granziera and Sekhposyan (2019) and Stauskas and Westerlund (in press) for the interest in forecasting these variables.

3.1 Determining the number of factors and their dependence

Fig. 1
figure 1

Scree plot of the eigenvalues of the covariance matrix of the US macroeconomic data set

Table 1 Determination of the number of underlying factors according to different criteria

To determine the number of static factors, we visually inspect the scree plot proposed by Cattell (1966), which appears in Fig. 1; see, for example, Hindrayanto et al. (2016), who also look at the scree plot to determine r. The message from the scree plot is not clear with only the presence of one factor being obvious. Alternatively, we also use statistical criteria to determine the number of factors; see Table 1 for a summary of the results. Using the criteria proposed by Alessi et al. (2010), the number of factors is determined to be either \(r=5\) or \(r=7\). These numbers are in concordance with the related literature analyzing the same data set (observed over different time spans), in which a large number of works determine \(r=7\) (Stock and Watson 2005) and Bai and Ng 2007), see also Poncela and Ruiz (2015) and Bennedsen et al. (2021), who chose \(r=4\). We also determine r using the popular criteria proposed by Bai and Ng (2002), according to which \(r=8\); see also Gonçalves et al. (2017), who, in a related application, select 8 factors without using any particular statistical criterion and McCracken and Ng (2016), Demetrescu and Hacioglu Hoke (2019) and Despois and Doz (2020), who also select \(r=8\). Moreover, the criteria proposed by Onatski (2010) determines \(r=1\); see, for example, Alvarez et al. (2016), who consider the case of \(r=1\) factors in this data set. Therefore, there is no agreement about the number of factors that should be used to represent the common movements in the system considered. Stauskas and Westerlund (in press) discuss about the uncertainty relative to the number of factors in this data set. Choi and Jeong (2019) also discuss about the number of factors in this data set and have Monte Carlo results on the difficulty in determining which criterion performs best. Consequently, in order to analyze the effect of the number of factors on the forecasts, we carry out the analysis by assuming three scenarios, namely, \(r=1\), \(r=3\) and \(r=7\).

Once the number of factors, r, is determined, factor extraction based on KFS requires assuming a particular lag p of the VAR model in (7). It is popular in the related literature, to assume that the dependence of the factors can be represented by VAR(1) models, i.e. by (16) with \(p=1\); see, for example, Stock and Watson (2005), Poncela and Ruiz (2015) and Alvarez et al. (2016). However, in practice, the temporal dependence of the factors may be represented by a VAR(p) model with \(p>1\); see, for instance, Banbura and Modugno (2014), who also consider \(p=2\) for a quarterly dataset and Solberger and Spanberg (2020), who also specify \(p=2\) for monthly data. In order to chose p, we extract a single factor by PC and analyze its correlogram and partial correlogram, which suggest that the factor could be represented by an AR(3) model. Consequently, in order to analyze the effect of p on KFS factor extraction and on the corresponding forecasts, we estimate the DFM assuming either that \(p=1\) or \(p=3\).

3.2 In-sample factor extraction

In this subsection, we first analyze the effect of the choice of r and p on the properties of factors extracted both by PC and KFS and on the in-sample performance of the factor-augmented predictive regressions in (15).

We first extract one single factor by PC and estimate the corresponding exact DFM with either \(p=1\) or \(p=3\).Footnote 14 In the latter case, the parameters of the DFM are estimated by TS-LS (PC for the loadings), ML-NO and ML-EM. Table 2 reports a summary of the results. In particular, it reports \(\sum _{i=1}^N {\hat{\lambda }}_{i1}^2\), \(\sum _{i=1}^N {\hat{\sigma }}_{\varepsilon i}^2\) together with the estimated autoregressive parameters and the MSE of the smoothed factor.Footnote 15 First of all, we can observe that the sums of squared loadings and of idiosyncratic variances and MSE(\({\hat{f}}_{t|T}\)) are the same regardless of whether the factor is assumed to be AR(1) or AR(3) or whether we estimate the model parameters by ML either using EM or numerically maximizing the log-likelihood. When the Kalman filter is run with the parameters estimated by TS-LS (PC), we can observe that the sum of squared loadings is slightly larger and the sum of idiosyncratic variances is slightly smaller. As a consequence, the Kalman (steady) MSE of the smoothed factor, \({\hat{f}}_{t|T}\), is smaller with an apparent increase in precision as compared with the steady MSE obtained when the parameters are estimated by ML. In any case, it is remarkable that the MSE of the PC extracted factor estimated as proposed in Bai (2003) is 0.01, approximately 5 times smaller than that obtained when the factors are extracted using the KF with ML estimates of the parameters. Furthermore, the implications of the estimation method and specification assumed for the factor are also clear when estimating its dynamic dependence. Consider first the estimated parameters in the model with \(p=1\). The ML estimate of the autoregressive parameter (regardless of whether it is estimated maximizing numerically the likelihood, 0.87, or using the EM algorithm, 0.85) is larger than that based on PC, 0.78. Furthermore, note that the slight differences between the ML results obtained when the likelihood is maximized numerically or when using the EM algorithm disappear when \(p=3\) is assumed. It seems that when the "true" log-likelihood is maximized its value at the maximum is the same regardless of the procedure used for its maximization. Finally, let us look at the roots implied by the estimated parameters of the AR(3) model. When the parameters are estimated based on the PC factors, the roots are 0.94 and \(-0.30\pm 0.35i\) while, if they are estimated by ML, the roots are 0.95 and \(-0.27\pm 0.28i\). In both cases, there is a cyclical behavior of the factor with a largest real root when the parameters are estimated by ML. In any case, the persistence of this real root is clearly larger than that obtained when an AR(1) model is assumed for the factor. These differences in the estimated persistence and number of lags of the factor may have implications for forecasting, mainly in periods of changing points because the forecasts adapt quicker if the number of lags is smaller. Finally, Table 2, which also reports the value of the log-likelihood at the maximum for the ML estimates, shows that, although there are not significant differences between the log-likelihood values obtained when the maximization is based on EM or numerical optimization, the difference between the log-likelihood when \(p=1\) and \(p=3\) is significant, according to the log-likelihood ratio test.

Table 2 Parameter estimates (associated with the first factor) of DFMs obtained using TS-LS, ML-NO, and ML-EM when r  =  1, 3 and 7 and p = 1 and 3
Fig. 2
figure 2

Factor loadings estimated for the set of macroeconomic variables using: (i) PC (blue bars) and (ii) ML with numerical optimization, ML-NO (green bars) and with EM, ML-EM (orange bars) (colour figure online)

Fig. 3
figure 3

A single factor (blue) extracted from the set of macroeconomic variables using PC (first row) and KFS with EM estimates of the parameters (second row). The first column plots the smoothed factor extracted assuming an AR(1) dependence while in the second column the factor is assumed to be an AR(3) process. The red lines represent the corresponding 95% confidence intervals (colour figure online)

Figure 2, which plots the loadings estimated by PC and by ML, in the latter case, using both numerical optimization and EM, shows that the loadings are similar regardless of the procedure used to estimate them. Furthermore, Fig. 3 plots the factors together with their corresponding 95% confidence intervals obtained by the KFS based on PC and EM parameter estimates reported in Table 2, together with their 95% confidence intervals.Footnote 16 The EM estimated factors have been rotated to be in the same space as those estimated by PC. Figure 3 illustrates that the extracted factors are similar regardless of the particular method implemented to extract them. However, the intervals constructed using ML parameter estimates are clearly larger than those obtained using PC parameters; see also Poncela and Ruiz (2015), who conclude that the asymptotic RMSEs obtained from the asymptotic distribution of the PC factors are unrealistically small.Footnote 17

The conclusions are similar when the factors are extracted assuming either that \(r=3\) or \(r=7\).Footnote 18 Table 2 shows that the only difference with respect to when r was assumed to be 1, is that, obviously, the sum of idiosyncratic variances is now smaller and, consequently, the MSE of the extracted factors is reduced to half.Footnote 19 It is also remarkable that the maximum of the log-likelihood reported in Table 2 is significantly larger when \(r=3\) than when \(r=1\). Similarly, when we assume that \(r=3\) and \(p=3\) and the parameters are estimated by TS-LS, we can observe that the estimation results are very similar to those obtained when we assumed that \(r=1\) and \(p=3\). Looking at the estimated dynamics of the first factor, we can observe that, they are very similar to those estimated when assuming that \(r=1\).Footnote 20 In particular, when the parameters are estimated by TS-LS, the roots of the characteristic equation are 0.94 and \(0.35\pm 0.20i\), very close to those estimated above. However, the results are somehow different when the parameters are estimated by ML. In this case, the roots are 0.81, \(-0.56\) and 0.6, rather different from those obtained when the parameters are estimated by TS-LS and when assuming that \(r=1\). Note that the estimation results reported in Table 2 when \(r=7\) are similar to those reported when \(r=3\).

The sample pairwise correlations between the first factor estimated in the different specifications and estimators considered range from 0.96 to 1.00 when \(r=1\) or 3. If \(r=7\), some of these correlations fall down to a minimum of 0.8. The minimum correlation, 0.96, is obtained when the factor is extracted assuming that \(r=1\) and \(p=1\) and estimating the parameters by ML and when it is assumed that \(r=3\) and \(p=3\) and the parameters of the DFM used to extract the factors are estimated by TS-LS. On the other hand, the maximum correlation, 1.00, is obtained when it is assumed that \(r=1\) and \(p=3\) and the parameters are estimated by ML either maximizing numerically the likelihood or using the EM algorithm; see also Lewis et al. (2020), who conclude that the factors are robust to whether PC or KFS is implemented for factor extraction when constructing a weekly index of real activity (EWI) based on \(N=10\) variables for USA and Breitung and Tenhofen (2011b), who conclude that, in the context of PC factor extraction, the specification of the underlying factors is not important when N is large.

Finally, note that the sum of squared loadings (idiosyncratic variances) is larger (smaller) when the parameters are estimated by TS-LS than when estimated by ML and, consequently, the confidence intervals for the factors are larger (and more realistic?) when the parameters are estimated by ML. Assuming a larger number of factors implies reducing the sum of idiosyncratic variances and, consequently, decreasing the MSE of the factors extracted using KF. This result is in concordance with Boivin and Ng (2006) who, in the context of PC factor extraction, conclude that overestimating the number of factors increases the precision of factor estimates (and the forecasts), while underestimating it has the opposite effect (on top of loosing consistency).

3.3 In-sample predictions

Table 3 Parameter estimates of factor-augmented predictive regressions for IP based on factors estimated using PC, TS-LS and ML-EM and data observed up to December 2016 together with their corresponding p-values in parenthesis, the sample standard deviation of residuals and corrected determination coefficient

To analyze whether the small differences in the estimated factors have implications on in-sample prediction, we estimate the factor-augmented predictive regressions in equation (15) with \(q=s=4\) for each of the four variables to be forecast, namely, IP, inflation, employment and real income, using the factors extracted by the alternative methods considered assuming that \(r=1\), 3 and 7 and \(p=1\) and 3. Note that, in this application, both T and N are rather large with \(\frac{\sqrt{T}}{N}=\frac{\sqrt{396}}{128}=0.155\) being close to zero and, consequently, using the results in Bai and Ng (2006), we can conclude that the factor estimation uncertainty should be negligible when conducting inference in the factor-augmented regression. Table 3 reports the estimates of the parameters of these regressions for IP growth, \(y_t\), together with their corresponding p-values obtained under the assumption of homoscedastic forecast errors, \(u_{2t}=y_{t}-{\hat{y}}_{t|t-1}\), the sample standard deviation of the corresponding residuals, \(\sigma _u\), and the adjusted determination coefficient, \({\bar{R}}^2\). In the case of more than one factor, Table 2 only reports the parameter estimates for the first factor.Footnote 21 First of all, note that testing for the joint in-sample significance of the factors, we reject the null regardless of r and p. Therefore, the factors have predictive power for IP. Comparing the \({\bar{R}}^2\)’s obtained using the factors extracted by PC for \(r=3\) and 7 with respect to those obtained for \(r=1\), we can observe that adding more factors does not increase significantly the in-sample predictive performance of the regressions for IP.Footnote 22 This result is in concordance with the conclusions in McCracken and Ng (2016), who interpret the first common factor (extracted using PC) as a real activity/employment factor. They find that the predictive information of the factors over IP changes over time with only the first common factor retaining its predictive information at the end of the sample period they consider. When \(r=1\), the estimated parameters of the factor-augmented predictive regressions are very similar regardless of whether \(p=1\) or 3 and the particular procedure used to extract the factors. Increasing the autoregressive lag of the factors and/or the number of factors only implies small improvements in the adjusted coefficient of determination with the best results obtained when the factors are extracted using ML-EM with \(p=3\) if \(r=1\) or 3. However, when \(r=7\), the results are slightly better when the factors are extracted using KFS with the parameters of the DFM estimated using TS-LS.

Table 4 Adjusted coefficients of determination of factor-augmented predictive regressions based on factors estimated using PC, TS-LS and ML-EM

Table 4, which reports the \({\bar{R}}^2\) of the augmented predictive regressions corresponding to inflation, employment and income, shows that the conclusions are the same for these variables than those of IP. Note that McCracken and Ng (2016) interpret the third common factor as an inflation factor while the second common factor was dominated by forward-looking variables such as term interest rate spreads and inventories. They show that, in the sample period they consider, the first common factor does not have any predictive content for forecasting inflation in later times. Therefore, it seems that including the relevant number of factors could be relevant in in-sample prediction while the specification of the autoregressive lag could be of less importance. Furthermore, factor extraction based on KFS is slightly better than that based on PC and, unless the number of parameters is too large, it is better to estimate the parameters using EM.

3.4 Out-of-sample forecasts

Table 5 One-step-ahead out-of-sample forecasts of first differences of industrial production, inflation, employment and income from January 2017 to December 2020, based on factor-augmented predictive regressions with factors estimated by PC, TS-LS and ML-EM

Finally, using the estimated factor-augmented regressions reported in Table 3 and the filtered factors obtained in the out-of-sample period, we obtain one-step-ahead forecasts of IP, inflation, employment and income from January 2017 to December 2020 and their corresponding 70% and 95% forecast intervals. We consider a fixed scheme with the parameters used for forecasting not being updated. Table 5 reports the empirical mean square forecast errors (MSFEs) and the empirical coverages of the 70% forecast intervals computed both with the forecasts obtained until December 2019 and until December 2020.Footnote 23 Note that in the latter case, we are incorporating in the analysis the forecasts obtained during the turbulent times due to the recession induced by the COVID-19 pandemic, while, in the former case, the forecasts are obtained in a “normal” time in the evolution of the variables. The ratio between the out-of-sample and in-sample number of observations is \(\frac{48}{396}=0.12\).

Fig. 4
figure 4

Out-of-sample forecasts of IP (first row) and inflation (second row) together with the corresponding 70% confidence intervals

Table 6 Tests of predictive ability of the factors when \(r=1\)

First of all, Table 5 shows that, even if the differences between the in-sample estimated factors and corresponding predictive regressions are minor, the performance of out-of-sample one-step-forecasts can be quite different. The procedure used to extract the factors and the estimator of the DFM parameters when the factors are extracted using the Kalman filter and smoother are relevant for the out-of-sample one-step-ahead forecasts performance. Table 6 reports the value of the ENC-F and MSE-F statistics to test the out-of-sample predictive ability of each of the factor-augmented predictive regressions considered when \(r=1\) with respect to the AR(4) model without factors.Footnote 24 Note that the asymptotic null distributions of ENC-F and MSE-F are not strictly positive and, consequently, the fact that the test values are sometimes negative does not necessarily constitute evidence in favor of the restricted model; see the explanations by Stauskas and Westerlund (in press). Table 6 shows that the ENC-F tests support that the factors are significative to forecast IP, employment and income. However, out-of-sample forecasts of inflation are only significantly different from the forecasts obtained with the AR(4) model without factors at the 90% level. In general, the ENC-F statistic is largest when the factor is extracted using KFS with the parameters estimated by ML-EM. However, the MSE-F test is more conservative and, except for IP, it generally rejects that the out-of-sample MSFEs are reduced by introducing a factor. Nevertheless, the differences between procedures used to extract the factors are more obvious when there are extraordinary movements in the series, as those observed during the COVID-19 crisis; see Fig. 4. When we take into account 2020, the differences in the MSFE obtained with respect to PC are striking (for instance, for IP, the out-of-sample MSFE based on the PC extracted factors is more than twice that of the forecasts based on ML extracted factors). However, removing the year 2020 gives very different numerical results. First, the magnitude of the MSFEs is considerably reduced. In particular, PC induced MSE is around 10 times smaller when excluding 2020. Nevertheless, the PC extracted factors still renders out of sample MSFEs around 20% larger than those of KFS methods. Regarding the length of the AR polynomial of the common factor for IP, notice that we always obtain smaller MSEs for \(p=1\), that is, the shorter the memory of the common factor, the smaller the MSFE. According to our results, forecasts of inflation based on models with \(r=1\) or \(r=3\) are different; see Fig. 4. On top of the noticeable differences between results including pre-COVID times and those that do not include them that we can also observe with three factor models, notice that both for IP and inflation including more factors does not necessarily translate into smaller out- of-sample MSFEs; see also Barhoumi et al. (2013), who also conclude that increasing the number of factors do not decrease MSFEs when forecasting French and German GDP. Indeed, in occasions those are larger than the corresponding ones from one factor predictive regressions. In general, regardless of the variable to be forecast, the out-of-sample performance is better for those forecasts based on models with smaller number of factors and smaller autoregression lags extracted using KFS and with the parameters estimated using TS-LS. Our results seem to support the KISS (Keep it sophistically simple) principle.

Table 7 Parameter estimates of factor-augmented predictive regressions for IP based on factors estimated using PC, TS-LS and ML-EM

Table 5, which also reports the empirical coverages of the 70% forecast intervals, shows that these intervals are usually too large with coverages well above the nominal. The reason for this empirical observation deserves further investigation, which is beyond our objectives in this paper.

3.5 Robustness check: forecasting over different periods of time

It is well known that, when forecasting in practice, the use of different window sizes for the out-of-sample forecasts may lead to different empirical results. It is possible that, for a given forecast window, significant predictive ability is not detected while it could be detected in another window. On the other hand, it is also possible to obtain satisfactory results just by chance. Moreover, the results on the ability of predictive models relies on the ratio between the out-of-sample and in-sample observations with the predictive tests, ENC-F and MSE-F, being more accurate when H is large. Consequently, in this subsection, we study the robustness of the empirical results above to the choice of estimation and out-of-sample window sizes. In particular, the parameters are estimated using data up to December 2007, so that the in-sample period does not include data from the last global financial crisis. In this case, the estimation size is \(T=288\) while the out-of-sample forecast size is \(H=156\). Therefore, \(\frac{\sqrt{T}}{N}=0.13\) and a ratio of out-of-sample to in-sample observations of 0.54.

Table 7 reports the parameter estimates of the factor-augmented predictive regressions for IP obtained using the in-sample data from January 1983 to December 2007 while Table 4 reports the \({\bar{R}}^2\) of the regressions not only of IP but also of inflation, employment and income.Footnote 25 Looking at the results for IP in Table 7, we can observe that the conclusions are mostly the same as those obtained when the regressions were estimated with data up to December 2016. The factors are significant and, although the fit, measured by the adjusted coefficient of determination, is smaller than those reported in Table 3, it is maximized when the predictive regressions are estimated including 7 factors estimated using KFS and specified as a VAR(3).

With respect to the in-sample fit of the predictive regressions of inflation estimated with data up to December 2007, Table 4 shows that it is very similar to that reported for the regressions estimated with data up to December 2016. The only difference observed is that in this latter case, the factors are not even significant to forecast inflation. The results for employment are very similar to those described for IP. Finally, when looking at the \({\bar{R}}^2\) coefficients of the predictive regressions for real income, we observe that they are slightly larger than those of the models estimated using data up to December 2016 but still support the main conclusion about being significant when the factors are included to forecast and maximized if 7 factors extracted using KFS and modeled as VAR(3) are considered. All in all, the main conclusions from the in-sample analysis are supported using this alternative estimation window.

Finally, Table 8 reports the MSFEs and coverages of the out-of-sample forecasts obtained from January 2008 until either December 2019 or December 2020. We can observe that the factors have predictive power if they are extracted using KFS; see also the results of the predictive ability tests in Table 6. As above, when considering the forecasts since January 2017, the results are stronger when the COVID19 pandemic year, 2020, is included in the out-of-sample period.

Table 8 One-step-ahead out-of-sample forecasts of first differences of industrial production, inflation, employment and income based on factor-augmented predictive regressions with factors estimated by: PC, TS-LS and ML-EM

4 Conclusions

The factors are highly correlated among them regardless of the procedure or estimator used for their extraction and the number of lags specified for their autoregressions. However, the main differences between factor estimates obtained using PC or KFS based on either TS-LS or ML-EM are observed in their dynamics and these differences may have implication in forecasting. In the particular US macroeconomic data set analyzed in this paper, the largest autoregressive root is closer to one when the factor is extracted using the KFS algorithm with the DFM’s parameters estimated by ML-EM. The likelihood-ratio tests of the DFMs favor specifications with more factors and more lags. Furthermore, the same conclusion is obtained when looking at the results of the in-sample factor-augmented predictive regressions, which have larger fit measures when the factors are extracted using KFS from DFMs with large number of factors modeled as VAR(p) processes with \(p>1\). With respect to the estimator of the parameters of the DFM, the results are better if the ML-EM estimator is used when the number of parameters to be estimated is not very large. However, if the number of parameters is large, the ML-EM estimator seems to have numerical problems and, consequently, the fit of the factor-augmented predictive regressions is better when the parameters are estimated using the simpler TS-LS estimator. Finally, according to our empirical results, we show that increasing the number of factors and/or their lag structure does not always lead to an increase in the out-of-sample forecast precision. The out-of-sample MSFEs are generally minimized when forecasts are based on simple models with one factor extracted using KFS and modelled as an AR(1) process. This conclusion is rather general for the four variables considered for forecasting in this paper. In any case, answering the question in the title of this paper, a careful specification of the DFM before factor extraction could be important in terms of in-sampling fitting. However, when forecasting out-of-sample simple specifications seem to be favored.