1 Introduction

In recent years, due to the availability of data on a vast number of macroeconomic and financial variables, there has been an increasing interest in modeling large systems of economic time series. In order to reduce the dimensionality and extract the underlying factors, one can use dynamic factor models (DFMs), originally introduced in economics by Geweke (1977) and Sargent and Sims (1977). The aim of DFMs is to represent the dynamics of the system through a small number of hidden common factors which are mainly used for forecasting and macroeconomic policy-making; see Stock and Watson (2011) and Breitung and Choi (2013), for recent reviews of the existing literature. Kajal Lahiri has contributed to the DFMs literature with several empirical works. For example, Lahiri and Yao (2004) implement a DFM to analyze the business cycle features of the transportation sector and Lahiri and Sheng (2010) to measure the forecast uncertainty by disagreement. Lahiri et al (2015) also implement a DFM to a real-time jagged-edge data set of over 160 explanatory variables to re-examine the role of consumer confidence surveys in forecasting personal consumption expenditure. The properties of many popular factor extraction procedures rely on the number of factors in the system being known. However, in practice, the number of factors is unknown and needs to be determined. Among the most popular procedures proposed with this purpose are the criteria proposed by Bai and Ng (2002), which are now standard in the literature. These criteria are based on modifications of the Akaike information criteria (AIC) and Bayesian information criteria (BIC) taking into account the cross-sectional and temporal dimensions of the dataset as arguments of the function penalizing overparametrization. Alternatively, Onatski (2010) proposes an estimator of the number of factors based on using differences between adjacent eigenvalues of the sample covariance matrix of the variables contained in the system, arranged in descending order, while Ahn and Horenstein (2013) propose two alternative estimators based on ratios of adjacent eigenvalues.

It is well known that macroeconomic time series are frequently non-stationary and possibly cointegrated. Within the context of principal components (PC) factor extraction, and following Stock and Watson (2002), the most popular way of dealing with large systems of non-stationary macroeconomic variables is by differencing the variables in a univariate fashion; see, for example, Breitung and Eickmeier (2011); Stock and Watson (2012a, (2012b); Barhoumi et al (2013); Buch et al (2014); Moench et al (2013); Bräuning and Koopman (2014); Poncela et al (2014) and Jungbacker and Koopman (2015) for recent references. The theoretical justification of this extended practice is analyzed in Bai and Ng (2004) who show that applying PC to first-differenced data and recovering the original factors by “recumulating” is consistent regardless of whether the factors and/or idiosyncratic errors are I(0) or I(1)Footnote 1. However, their theory proceeds assuming that the number of common factors in the system is known. On the other hand, as mentioned above, macroeconomic variables are not only non-stationary but can also be cointegrated. Differencing a cointegrated system may distort the determination of the number of factors due to the introduction of non-invertible moving average (MA) components and/or the trade-off introduced between the variances of the common and idiosyncratic components. Surprisingly, there has been little discussion in the literature on whether differencing in a univariate fashion affects the correct determination of the number of factors. As far as we know, only Bai (2004) analyzes the performance of the information criteria proposed by Bai and Ng (2002) when implemented to differenced data. In his Monte Carlo experiments, carried out for a unique DFM with contemporaneously uncorrelated idiosyncratic noises following an ARMA model and two random walk factors, he shows that the number of factors is correctly determined.

The main objective of this paper is to fill this gap by analyzing the effects of univariate stationary transformations of cointegrated systems when determining the number of factors using the approaches proposed by Bai and Ng (2002); Onatski (2010) and Ahn and Horenstein (2013). In the context of a DFM with mutually uncorrelated and homoscedastic idiosyncratic noises, we first derive analytically the eigenvalues of the covariance matrix and show how they are affected by univariate differentiation. We also carry out Monte Carlo experiments considering several designs selected to represent different situations that can be potentially encountered when dealing with the empirical analysis of real macroeconomic variables. Finally, we illustrate the results determining the number of factors in a system of prices of the euro area. It is important to note that the procedures for determining the number of factors considered in this paper are designed for what is known in the literature as static factors. Alternatively, several factor determination procedures have been proposed in the context of dynamic factors; see, for example Amengual and Watson (2007); Hallin and Liska (2007); Bai and Ng (2007); Jacobs and Otter (2008) and Breitung and Pigorsch (2013). The difference between static and dynamic factors is described by, for example, Bai and Ng (2008). They argue that, although dynamic factors can be useful to establishing the number of primitive shocks in the economy, the properties of estimated static factors are better understood from a theoretical point of view. Furthermore, we focus the analysis on procedures to detect the number of static factors as they are more popular in empirical economics.

The rest of this paper is structured as follows. In Sect. 2, we briefly describe the stationary DFM and the factor determination approaches considered. In Sect. 3, we analyze the effects of transforming non-stationary systems by univariate stationary transformations on these procedures. In Sect. 4, we report the results of the Monte Carlo experiments carried out to illustrate their finite sample performance. In Sect. 5, we carry out an empirical application. Finally, we conclude in Sect. 6.

2 The stationary dynamic factor model

In this section, we introduce notation and the stationary DFM and describe the factor determination procedures considered.

2.1 The model

We consider a DFM with cross-sectional dimension N, where the unobserved \(r<N\) common factors, \(F_{t}=(F_{1t},\dots ,F_{rt}){^{\prime }}\), and the idiosyncratic noises, \(\varepsilon _{t}=(\varepsilon _{1t},\dots ,\varepsilon _{Nt}){^{\prime }}\), follow VAR(1) processes. The factors explain the common evolution of a vector of time series, \(Y_{t}=(y_{1t},\dots ,y_{Nt}){^{\prime }}\) observed from \(t=1,\dots ,T\). The basic DFM considered is given by

$$\begin{aligned} Y_{t}= & {} PF_{t}+\varepsilon _{t}, \end{aligned}$$
(1)
$$\begin{aligned} F_{t}= & {} \varPhi F_{t-1}+\eta _{t}, \end{aligned}$$
(2)
$$\begin{aligned} \varepsilon _{t}= & {} \varGamma \varepsilon _{t-1}+a_{t}, \end{aligned}$$
(3)

where the factor disturbances, \(\eta _{t}=(\eta _{1t},\dots ,\eta _{rt}){^{\prime }}\), are \(r\times 1\) vectors, distributed independently from the idiosyncratic noises for all leads and lags. Furthermore, \(\eta _{t} \) and \(a_{t}\) are Gaussian white noises with positive definite covariance matrices \(\varSigma _{\eta }\) and \(\varSigma _{a}\), respectively, and \(P=({p^{\prime }}_{1},\dots ,{p^{\prime }}_{N}){^{\prime }}\), is the \(N\times r\) matrix of factor loadings, where, \(p_{i}=(p_{i1},\dots ,p_{ir})\). Finally, \(\varPhi =\text {diag}(\phi _{1},\dots ,\phi _{r})\) and \(\varGamma \) are \(r\times r\) and \(N\times N\) matrices containing the autoregressive parameters of the factors and the idiosyncratic components, respectively. These autoregressive matrices satisfy the usual stationarity assumptions. Furthermore, we assume that the structure of the idiosyncratic noises is such that they are weakly correlated. Following Bai and Ng (2002); Onatski (2012, (2015) and Ahn and Horenstein (2013), we consider the entries in P\(\varPhi \), \(\varSigma _{\eta },\) \(\varGamma \) and \(\varSigma _{a}\) as fixed parameters. Jungbacker and Koopman (2015) and Alvarez et al (2016) implement the DFM in Eqs. (1) to (3) to the data set of Stock and Watson (2005).

The DFM in Eqs. (1) to (3) is not identified because, for any \(r\times r\) nonsingular matrix H, the system can be expressed in terms of a new loading matrix and a new set of common factors. A normalization is necessary to solve this identification problem and uniquely define the factors. In the context of PC factor extraction, it is common to impose the restriction \(P{^{\prime }}P/N=I_{r}\) and \(FF{^{\prime }}\) being diagonal, where \(F=(F_{1},\dots ,F_{T})\) is a \(r\times T\) matrix of common factors; see Stock and Watson (2002); Bai and Ng (2002, (2008, (2013); Connor and Korajczyk (2010) and Bai and Wang (2014) for papers dealing with identification issues. Note that these are normalization restrictions, and they may not have an economic interpretation.

2.2 Determining the number of factors

The DFM described above assumes that the number of factors, r, is known. However, in practice, it needs to be estimated. Obtaining the correct value of r is crucial for an adequate estimation of the space spanned by the factors. There are several alternative procedures designed to determine r in DFMs. In this paper, we consider the information criteria proposed by Bai and Ng (2002) and the estimators proposed by Onatski (2010) and Ahn and Horenstein (2013).Footnote 2

2.2.1 The Bai and Ng (2002) information criteria

The most popular information criteria to select the number of factors in DFMs, proposed by Bai and Ng (2002), are based on a consistent PC estimator of P and \(F_{t}\) which is given by the solution to the following least squares problem

$$\begin{aligned} \min _{F_{1},\dots ,F_{T},P}V_{r}(P,F) \end{aligned}$$
(4)

subject to \(P{^{\prime }}P/N=I_{r}\ \text {and}\ \textit{FF}{^{\prime }}\ \text {being diagonal,}\) where

$$\begin{aligned} V_{r}(P,F)=\frac{1}{NT}\sum _{t=1}^{T}(Y_{t}-\textit{PF}_{t}){^{\prime }}(Y_{t}-\textit{PF}_{t})=\frac{1}{NT}\sum _{t=1}^{T}\sum _{i=1}^{N}\varepsilon _{it}^{2}=\frac{1}{NT}tr(\varepsilon \varepsilon {^{\prime }}),\quad \end{aligned}$$
(5)

where \(\varepsilon =(\varepsilon _{1},\dots ,\varepsilon _{T})\) has dimension \(N\times T\). The solution to (4) is obtained by setting \(\hat{P}\) equal to \(\sqrt{N}\) times the eigenvectors corresponding to the r largest eigenvalues of \(YY{^{\prime }}\) where \(Y=(Y_{1},\dots ,Y_{T})\). The corresponding PC estimator of F is given by \(\hat{F}=N^{-1}\hat{P}{^{\prime }}Y.\)

PC factor extraction separates the common component, \(PF_{t}\), from the idiosyncratic noises by averaging cross-sectionally the variables within \(Y_{t}\) such that when N and T tend simultaneously to infinity, the weighted averages of the idiosyncratic noises converge to zero, remaining only the linear combinations of the factors. Therefore, it requires that the cumulative effects of the common component increase proportionally with N, while the eigenvalues of \(\varSigma _{\varepsilon }=E(\varepsilon _{t}\varepsilon _{t}{^{\prime }})\) remain bounded; see the review of Breitung and Choi (2013) for a description of these conditionsFootnote 3. Bai (2003) proves that the PC estimators of factors, factor loadings and common components are asymptotically equivalent to the maximum likelihood estimators and, consequently, consistent. Also, he derives the rate of convergence and their corresponding limiting distributions when N and T tend simultaneously to infinity.

In order to determine r, Bai and Ng (2002) propose minimizing the following functions with respect to k, for \(k=0,\dots ,r_{\max },\)

$$\begin{aligned} \textit{IC}_{1}(k)&=\ln V_{k}(\hat{P},\hat{F})+k\frac{N+T}{NT}\ln \frac{NT}{N+T}, \end{aligned}$$
(6a)
$$\begin{aligned} \textit{IC}_{2}(k)&=\ln V_{k}(\hat{P},\hat{F})+k\frac{N+T}{NT}\ln m, \end{aligned}$$
(6b)
$$\begin{aligned} \textit{IC}_{3}(k)&=\ln V_{k}(\hat{P},\hat{F})+k\frac{\ln m}{m}, \end{aligned}$$
(6c)

where \(V_{k}(\hat{P},\hat{F})\) is defined as in expression (5) with P and \(F_{t}\) substituted by their respective PC estimates, \(m=\min \left\langle N,T\right\rangle \) and \(r_{\max }\) is a bounded integer such that \(r\le r_{\max }\). The criteria in (6) are quite sensitive to the choice of \(r_{\max }\); see the Monte Carlo results in Ahn and Horenstein (2013); Bai and Ng (2002) use \(r_{\max }=8\) in their Monte Carlo experiments. On the other hand, in the context of first-differenced data, Bai and Ng (2004) use \({\textit{IC}}_{1}(k)\), with \(r_{\max }=6\). Under appropriate assumptions, Bai and Ng (2002) prove the consistency of the information criteria above to determine the number of common factors.

If \(\hat{\varepsilon }_{t}=Y_{t}-{\hat{P}}{\hat{F}}_{t}\) are the residuals of the regression of the variables in Y on the r first principal components of \(\frac{1}{NT}YY{^{\prime }}\), then \(tr(\hat{\varepsilon }\hat{\varepsilon }{^{\prime }})=tr(YY{^{\prime }})-tr(\hat{P}\hat{F}\hat{F}{^{\prime }}\hat{P}{^{\prime }})=T\sum _{i=1}^{m}\hat{\lambda }_{i}-T\sum _{i=1}^{r}\hat{\lambda }_{i}=T\sum _{i=r+1}^{m}\hat{\lambda }_{i},\) where \(\hat{\lambda }_{i}\), \(i=1,\dots ,m\) are the eigenvalues of \(\hat{\varSigma }_{Y}=\frac{1}{T}YY{^{\prime }},\) arranged in descending order. Therefore,

$$\begin{aligned} V_{r}(\hat{P},\hat{F})=\frac{1}{N}\sum _{i=r+1}^{m}\hat{\lambda }_{i}. \end{aligned}$$
(7)

Using the expression of \(V_{k}(\hat{P},\hat{F})\) in (7), the functions in (6) can be written as

$$\begin{aligned} {\textit{IC}}{j}(k)=\ln \left( \frac{1}{N}\sum _{i=k+1}^{m}\hat{\lambda }_{i}\right) +kg_{j}(N,T), \end{aligned}$$
(8)

where \(g_{j}(N,T)\) is defined accordingly to the criteria in (6) for \(j=1,\) 2 and 3.

2.2.2 Differenced eigenvalues

Onatski (2010) proposes an alternative procedure to select r, called edge distribution (ED), and shows that it outperforms the criteria proposed by Bai and Ng (2002) when the proportion of the variance attributed to the factors is small relative to the variance due to the idiosyncratic noises or when these are substantially correlated. Furthermore, computationally, the procedure proposed by Onatski (2010) allows the determination of the number of factors without previous estimation of the common component. Finally, it relaxes the standard assumption of PC factor extraction about the r eigenvalues of \(\hat{\varSigma }_{Y}\) growing proportionally to N. Instead of requiring that the cumulative effect of factors grow as fast as N , Onatski (2010) imposes a structure on the idiosyncratic noises. Under the assumption of Normality, both cross-sectional and temporal dependence are allowed. This procedure is based on determining a sharp threshold, \(\delta \), which consistently separates the bounded and diverging eigenvalues of \(\hat{\varSigma }_{Y}\). For any \(j>r\), the differences \(\hat{\lambda }_{j}-\hat{\lambda }_{j+1}\) converge to 0 while the difference \(\hat{\lambda }_{r}-\hat{\lambda }_{r+1}\) diverges to infinity when both N and T tend to infinity. Assuming that \(r_{\max }/N\rightarrow 0\), Onatski (2010) proposes the following algorithm in order to calibrate \(\delta \) and determine the number of factors:

  1. 1.

    Obtain \(\hat{\lambda }_{i}\), \(i=1,...,N\) and set \(j=r_{\max }+1\).

  2. 2.

    Obtain \(\hat{\beta }\) as the ordinary least squares (OLS) estimator of the slope of a simple linear regression with constant, where the observations of the dependent variable are \(\left\{ \hat{\lambda }_{j},\dots ,\hat{\lambda }_{j+4}\right\} \) and the observations of the regressor variable are \(\lbrace (j-1)^{2/3},\dots ,\) \((j+3)^{2/3}\rbrace \)

  3. 3.

    Estimate \(\hat{r}=\max \{k\le r_{\max }|\hat{\lambda }_{k}-\hat{\lambda }_{k+1}\ge \hat{\delta }\}\) or \(\hat{r}=0\) if \(\hat{\lambda }_{k}-\hat{\lambda }_{k+1}<\hat{\delta }\).

  4. 4.

    Set \(j=\hat{r}+1\). Repeat steps 2 and 3 until \(\hat{r}\) converges.

Under suitable conditions, Onatski (2010) proves the consistency of \(\hat{r}\) for any fixed \(\delta >0\). He sets the number of iterations to four although the convergence of the above algorithm is often achieved at the second iteration. Additionally, he sets \(r_{\max }=8\) when \(r=1,2,5\) and \(r_{\max }=20\) when \(r=15\).

2.2.3 Ratios of eigenvalues

Recently, Ahn and Horenstein (2013) propose two further estimators of the number of factors based on the fact that the r largest eigenvalues of \(\hat{\varSigma }_{Y}\) grow unbounded as N increases, while the other eigenvalues remain bounded. They show that these estimators are less sensitive to the choice of \(r_{\max }\) than those based on the Bai and Ng (2002) information criteria. The two new estimators are defined as the value of k, for \(k=0,\dots ,r_{\max },\) that maximizes the following ratios

$$\begin{aligned} \textit{ER}(k)= & {} \frac{\hat{\lambda }_{k}}{\hat{\lambda }_{k+1}}, \end{aligned}$$
(9)
$$\begin{aligned} \textit{GR}(k)= & {} \frac{\ln \left[ V_{k-1}(\hat{P},\hat{F})/V_{k}(\hat{P},\hat{F})\right] }{\ln \left[ V_{k}(\hat{P},\hat{F})/V_{k+1}(\hat{P},\hat{F})\right] }=\frac{\ln (1+\hat{\lambda }_{k}^{*})}{\ln (1+\hat{\lambda }_{k+1}^{*})}, \end{aligned}$$
(10)

where \(\hat{\lambda }_{0}=\frac{1}{m}\sum _{k=1}^{m}\hat{\lambda }_{k}/\ln (m)\) and \(\hat{\lambda }_{k}^{*}=\hat{\lambda }_{k}/\sum _{j=k+1}^{m}\hat{\lambda }_{j}\). The value of \(\hat{\lambda }_{0}\) has been chosen following the definition of Ahn and Horenstein (2013) according to which \(\hat{\lambda }_{0}\rightarrow 0\) and \(m\hat{\lambda }_{0}\rightarrow \infty \) as \(m\rightarrow \infty .\) Footnote 4

Note that both the numerator and denominator of GR(k) are the growth rates of sums of residual variances computed with j and \(j+1\) factors. Ahn and Horenstein (2013) show that, contrary to the estimator proposed by Bai and Ng (2002), their estimators are not dependent on \(r_{\max }\) and suggest to chose it as \(\min (r_{\max }^{*},0.1m)\) where \(r_{\max }^{*}=\#\left\{ k\mid N^{-1}\hat{\lambda }_{k}\ge V_{0}/m,k\ge 1\right\} \). Under the same assumptions of Bai and Ng (2006) and Onatski (2010), and allowing for some variables in Y to be perfectly multicollinear or with zero idiosyncratic variances, they establish consistency of the \(\textit{ER}(k)\) and \(\textit{GR}(k)\) estimators. The results obtained in their Monte Carlo analysis show that the two estimators outperform the Bai and Ng (2002) information criteria and Onatski (2010) estimator mainly when the idiosyncratic components are simultaneously cross-sectionally and serially correlated. However, the estimator proposed by Onatski (2010) outperforms the \(\textit{ER}(k)\) and \(\textit{GR}(k)\) ratios when the variance of the idiosyncratic component is larger than that of the common component (weak factors).

2.3 A note on the convergence of eigenvalues

The procedures to determine the number of common factors described above are based on the eigenvalues of the sample covariance matrix, \(\hat{\varSigma }_{Y}\). One of the main contributions of Bai and Ng (2002) is to show that the convergence of the eigenvalues of \(\frac{1}{TN}YY{^{\prime }}\) depends on m. Later, Kapetanios (2010) reviews the available literature about the topic pointing out that the distribution of the largest eigenvalue depends in complicated ways on the parameters of the model. It seems that serial correlation affects both the parameters of the asymptotic limits and their functional form. Furthermore, he shows that the first r eigenvalues of \(\varSigma _{Y}\) increase at rate N which follows from the fact that the r largest eigenvalues of \(F{^{\prime }}F\) will grow at rate N as long as the loading matrix P is not sparse and suggests that it is reasonable to expect a similar behavior from the eigenvalues of the sample covariance matrix.

More recently, Onatski (2012, (2015) develops new asymptotics for the eigenvalues of the sample covariance matrix by considering that both the weights and the factors are fixed parameters.

3 Determining the number of factors after differencing

As mentioned in the Introduction, macroeconomic systems are often non-stationary. In this section, we analyze the effects on the performance of the number of factors determination procedures described above of transforming the data in a univariate fashion in order to achieve stationarity. Note that differencing affects the ratio between the variances of the factors and idiosyncratic components, the temporal dependence structure and the cross-correlations among the idiosyncratic noises.

Consider the DFM given in Eqs. (1) to (3) in which \(\varPhi \) and \(\varGamma \) are diagonal matrices which may have 1’s in the main diagonal. Consequently, both the factors and the idiosyncratic noises can be either stationary or non-stationary random walks. Under this specification, the system of first-differenced data satisfies all conditions of Bai and Ng (2002); Onatski (2010) and Ahn and Horenstein (2013). After differencing the data in a univariate fashion, the DFM takes the following form

$$\begin{aligned} \varDelta Y_{t}= & {} P\varDelta F_{t}+\varDelta \varepsilon _{t}, \end{aligned}$$
(11)
$$\begin{aligned} \varDelta F_{t}= & {} (\varPhi -I)F_{t-1}+\eta _{t}, \end{aligned}$$
(12)
$$\begin{aligned} \varDelta \varepsilon _{t}= & {} (\varGamma -I)\varepsilon _{t-1}+a_{t}. \end{aligned}$$
(13)

Denote by \(\phi _{i}\) the i-th element in the main diagonal of \(\varPhi .\) If \(|\phi _{i}|<1\), then the variance of the corresponding differenced factor is given by \(\sigma _{f_{i}}^{2}=2\sigma _{\eta _{i}}^{2}/(1+\phi _{i})\) where \(\sigma _{\eta _{i}}^{2}\) is the variance of \(\eta _{i}\). When \(\phi _{i}=0.5\), the difference between the variances of \(F_{t}\) and \(\varDelta F_{t}\) is zero. Therefore, in this case, the variance of the factor is not changed after differencing the data. However, if \(\phi _{i}<0.5\), the variance of \(\varDelta F_{t}\) is larger than that of \(F_{t}\) while if \(\phi _{i}>0.5\), it is smaller. The same relation can be established for the variances of the elements in \(\varepsilon _{t}\) and \(\varDelta \varepsilon _{t}\) with respect to \(\gamma _{i}\), the i-th element in the main diagonal of \(\varGamma \). Note that if \(\varepsilon _{t}\) is stationary, with autoregressive parameters smaller than 0.5 while \(F_{t}\) is non-stationary, then overdifferencing the idiosyncratic components may introduce distortions on the determination of the number of factors given that the relation between the variances of the common and idiosyncratic components is modified with the variances of \(\varDelta F_{t}\) being smaller and the variances of \(\varDelta \varepsilon _{t}\) being larger. The dynamic dependence of the idiosyncratic noises of the differenced model are given by

$$\begin{aligned} Corr(\varDelta \varepsilon _{it},\varDelta \varepsilon _{it-h})= 0.5\gamma _i^{h-1}(\gamma _i - 1). \end{aligned}$$

Finally, note that differencing also affects the cross-correlations of the idiosyncratic noises. Consider, for example, that the correlation between \(\varepsilon _{it}\) and \(\varepsilon _{jt}\) is given by \(\rho \). If the idiosyncratic noises are stationary, then

$$\begin{aligned} Corr(\varDelta \varepsilon _{it},\varDelta \varepsilon _{jt})=\sigma _{\varDelta \varepsilon _{i}}^{-1}\sigma _{\varDelta \varepsilon _{j}}^{-1}\left( 2-\gamma _{i}-\gamma _{j}\right) \rho \sigma _{\varepsilon _{i}}\sigma _{\varepsilon _{j}}=\frac{0.5\left( 2-\gamma _{i}-\gamma _{j}\right) \rho }{\sqrt{(1-\gamma _{i})(1-\gamma _{j})}}. \end{aligned}$$

In order to simplify the analysis of the effects of univariate differentiation on the determination of r, we consider \(\varGamma =\gamma I\) and \(\varSigma _{a}=\sigma _{a}^{2}I\), so that the idiosyncratic noises are homoscedastic and mutually uncorrelated and all of them are governed by the same autoregressive parameter. Given that there is no correlation between the factors and the idiosyncratic components, the covariance matrix of the first-differenced data is given by \(\varSigma _{\varDelta Y}=P\varSigma _{f}P{^{\prime }}+\sigma _{e}^{2}I\), where \(\varSigma _{f}\) is the covariance matrix of \(\varDelta F_{t}\) and \(\sigma _{e}^{2}=2\sigma _{a}^{2}/(1+\gamma )\) is the variance of each element in \(\varDelta \varepsilon _{t}.\) The ordered eigenvalues of \(\varSigma _{\varDelta Y}\) are equal to \(\sigma _{e}^{2}+\mu _{i}\) for \(i=1,\dots ,N\), where \(\mu _{i}\) is the i-th largest eigenvalue of \(P\varSigma _{f}P{^{\prime }}\). Furthermore, \(tr\left( P\varSigma _{f}P{^{\prime }}\right) =tr\left( P{^{\prime }}P\varSigma _{f}\right) =\sum _{j=1}^{r}\sigma _{f_{j}}^{2}\sum _{i=1}^{N}p_{ij}^{2}=\sum _{j=1}^{r}\mu _{j}\). Therefore, the sum of the r largest eigenvalues of \(\varSigma _{\varDelta Y}\) is given by \(\sum _{i=1}^{r}\lambda _{i}=r\sigma _{e}^{2}+\sum _{j=1}^{r}\sigma _{f_{j}}^{2}\sum _{i=1}^{N}p_{ij}^{2}\), while the rest \(N-r\) eigenvalues are given by \(\lambda _{i}=\sigma _{e}^{2}.\)

Consider the particular case of a unique random walk factor, i.e., \(r=1\) and \(\phi _{1}=1\). In this case, \(\lambda _{1} =\sigma _{\eta }^{2}\sum _{i=1}^{N}p_{i1}^{2}+\sigma _{e}^{2}\) and \(\lambda _{i}=\sigma _{e}^{2},\) for \(i=2,\dots ,N.\) Consequently, the function to be minimized according to the Bai and Ng (2002) information criteria, is given by

$$\begin{aligned} \textit{IC}(k)=\left\{ \begin{array}{l@{\quad }l} \ln \left( N^{-1}\sigma _{\eta }^{2}\sum _{i=1}^{N}p_{i1}^{2}+\sigma _{e}^{2}\right) , &{} k=0 \\ \ln (N-k)-\ln (N)+\ln (\sigma _{e}^{2})+kg(N,T), &{} k\ge 1.\end{array}\right. \end{aligned}$$

The procedure proposed by Onatski (2010) is based on the differences between adjacent eigenvalues. Note that for \(j=2,\dots ,N\), \(\lambda _{j}-\lambda _{j+1}=0\). Therefore, the procedure should work as far as the difference between \(\lambda _{1}\) and \(\lambda _{2}\) is large. This difference is given by \(\lambda _{1}-\lambda _{2}=\sigma _{\eta }^{2}\sum _{i=1}^{N}p_{i1}^{2}\) and does not depend on the value of \(\sigma _{e}^{2}\). Therefore, for given weights and cross-sectional dimension, the procedure should work better when \(\sigma _{\eta }^{2}\) is large. Also, for a given value of \(\sigma _{\eta }^{2},\) the procedure should work better as N increases. Note that in the first step of the algorithm proposed by Onatski (2010), \(\hat{\delta }=0\) because for \(j=r_{\max }+1\) eigenvalues \(\lambda _{j}\) are always \(\sigma _{e}^{2}\).

Consider the ER(k) criterion of Ahn and Horenstein (2013) given in (9) which looks for a large difference between the ratio of \(\lambda _{1}\) and \(\lambda _{2}\) with respect to the ratios between other adjacent eigenvalues. Note that, in the particular case we are considering, if \(N < T\), the mock eigenvalue is given by \(\lambda _{0}= \ln (N)^{-1}\left( \sigma _{e}^{2}+N^{-1}\sigma _{\eta }^{2}\sum _{i=1}^{N}p_{i1}^{2}\right) \), and, consequently,

$$\begin{aligned} ER(k)=\left\{ \begin{array}{l@{\quad }l} \frac{1+N^{-1}q\sum _{i=1}^{N}p_{i1}^{2}}{\ln (N)\left( 1+q\sum _{i=1}^{N}p_{i1}^{2}\right) }, &{} k=0 \\ 1+q\sum _{i=1}^{N}p_{i1}^{2}, &{} k=1 \\ 1, &{} k\ge 2, \end{array}\right. \end{aligned}$$

where \(q=\frac{\sigma _{\eta }^{2}(1+\gamma )}{2\sigma _{a}^{2}}.\) Note that if N is large enough, ER(0) should be close to 0. Therefore, for given weights, the criteria should work better when q is larger.

Finally, consider the GR(k) criterion of Ahn and Horenstein (2013). In this case, note that

$$\begin{aligned} \lambda _{i}^{*}=\left\{ \begin{array}{l@{\quad }l} (N\ln (N))^{-1}, &{} i=0 \\ (N-1)^{-1}(q\sum _{i=1}^{N}p_{i1}^{2}+1), &{} i=1 \\ (N-i)^{-1}, &{} i\ge 2. \end{array}\right. \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{\ln (1+\lambda _{k}^{*})}{\ln (1+\lambda _{k+1}^{*})}=\left\{ \begin{array}{l@{\quad }l} \frac{\ln (N\ln N+1)-\ln (N\ln N)}{\ln (N + \sum _{i=1}^Np_{i1}^2) - \ln (N-1)}, &{} k=0 \\ \frac{\ln (N + \sum _{i=1}^Np_{i1}^2) - \ln (N-1)}{\ln (N-1) - \ln (N-2)},&{} k=1 \\ \frac{\ln (N+1-k)-\ln (N-k)}{\ln (N-k)-\ln (N-k-1)}, &{} k\ge 2.\end{array}\right. \end{aligned}$$

4 Finite sample performance

The results in the previous section are based on population covariance matrices and their corresponding eigenvalues. However, in practice, when determining the number of common factors in empirical applications, one should estimate the covariance matrix by its sample version and obtain the corresponding estimated eigenvalues. As mentioned above, the asymptotic distribution of estimated eigenvalues is complicated and not always known. The finite sample properties of the estimated eigenvalues depend on the temporal sample size used for their estimation, T, the cross-sectional dimension, N, the ratio between the variances of the common and idiosyncratic components and the structure of the temporal and cross-sectional dependencies of the idiosyncratic noises. In this section, we carry out Monte Carlo experiments in order to analyze how the determination of the number of factors is affected by univariate differentiation of non-stationary data when implemented in finite samples. We should note that the procedures considered have been developed for N and T going to infinity. However, when the procedures are implemented in practice, both N and T are finite. Our interest in this paper is to study the performance of the criteria under different combinations of N and T similar to those often encountered when dealing with systems of macroeconomic and financial variables. Furthermore, we want to investigate how small N and T can be for the procedures to be reliable under different structures of the factors and idiosyncratic noises. In this way, our results can be of interest for practitioners in empirical applications.

The experiments are based on \(R=500\) replications generated by the DFM in Eqs. (1) to (3) with \(N=(12, 50, 100, 200)\) and \(T=(100, 500)\) Footnote 5. Our simulations are categorized into two parts. The first part is designed to investigate how the alternative estimators considered behave when detecting a unique random walk factor under different temporal and cross-sectional structures of the idiosyncratic noises. The second part is designed to analyze models with more than one factor.

Consider first a DFM defined as in Eqs. (1) to (3) with \(r=1\), \(\varPhi =1\) and \(\sigma _{\eta }^{2}=1.\) The factor loadings are generated by \(p_{i1}\thicksim U\left[ 0,1\right] \) with \(\sum _{i=1}^{N}p_{i1}^{2}=5.59\), 18.70, 34.63 and 65.56 for \(N=12\), 50, 100 and 200, respectively; Bai and Ng (2006) and Poncela and Ruiz (2016) also generate the factor loadings by the same distribution. We consider several structures for the idiosyncratic noises. First, the idiosyncratic noises are mutually uncorrelated and homoscedastic. In particular, the autoregressive coefficient matrix of the idiosyncratic components is diagonal, \(\varGamma =\gamma I,\) with \(\gamma =(-0.8,\) 1) and \(\varSigma _{a}=\sigma _{a}^{2}I\) with \(\sigma _{a}^{2}=1\) so that \(\sigma _{e}^{2}=10\) and 1 for the values of \(\gamma \) considered. Note that, differently from simulations carried out in related works, we consider both positive and negative values for the autoregressive parameter of the idiosyncratic noises; see, Pinheiro et al (2013) who estimate correlations for \(\varDelta \varepsilon _{t}\) between -0.6 and 0.9 when dealing with the U.S. monthly macroeconomic data set of Stock and Watson (2005). In order to separate the effects of the temporal dependence and the variance of the differenced idiosyncratic noises on the results, we also consider the combinations \(\gamma =-0.8\) and \(\sigma _{a}^{2}=0.1\) (\(\sigma _{e}^{2}=1)\) and \(\gamma =1\) and \(\sigma _{a}^{2}=10\) (\(\sigma _{e}^{2}=10).\) We introduce contemporaneous correlations among the idiosyncratic noises. \(\varSigma _a\) is generated with \(\sigma _{a}^{2}=0.1,\) 1 and 10 in the main diagonal and, following Onatski (2012), a Toeplitz structure with parameter \(b=0.5.\) Finally, we consider models with heteroscedastic idiosyncratic noises. The variances are generated by \(\sigma _{a_{i}}^{2}\thicksim U\left[ 0.5,1.5\right] ,\) \(\sigma _{a_{i}}^{2}\thicksim U\left[ 0.05,0.15\right] \) and \(\sigma _{a_{i}}^{2}\thicksim U\left[ 5,15\right] \); see Bai and Ng (2006) and Breitung and Eickmeier (2011) for the same design to simulate heteroscedastic idiosyncratic noises. In these two latter cases, we consider \(\gamma =-0.8\) and 1.

For each replica, we generate observations \(Y_{t}\) and differentiate the data in a univariate fashion. Then, the eigenvalues of the sample covariance matrix of \(\frac{1}{T-1}(\varDelta Y)(\varDelta Y){^{\prime }}\) are computed and r is determined using each of the procedures described above with \(r_{\max }=4\), 7 and 13 when \(N=12\), 50 and 200,  respectivelyFootnote 6. The number of factors determined using the three criteria proposed by Bai and Ng (2002) are denoted by \(\hat{r}_{{\textit{IC}}{1}}\), \(\hat{r}_{{\textit{IC}}{2}}\), \(\hat{r}_{{\textit{IC}}{3}}\), while the number of factors determined implementing the procedure due to Onatski (2010) is denoted by \(\hat{r}_{\textit{ED}}\). Finally, the number of factors estimated using the two ratios proposed by Ahn and Horenstein (2013) are denoted by \(\hat{r}_{\textit{ER}}\) and \(\hat{r}_{GR}.\)

Figure 1 plots, for \(N=12\) and \(T=100,\) the Monte Carlo averages and 95 % confidence intervals, for homoscedastic and contemporaneously uncorrelated idiosyncratic noisesFootnote 7, of i) the sample ordered eigenvalues; ii) their differences; and iii) their ratios, together with the corresponding population quantities, when \(\gamma =-0.8\) and \(\sigma _{a}^{2}=0.1\), \(\gamma =1\) and \(\sigma _{a}^{2}=1\), \(\gamma =-0.8\) and \(\sigma _{a}^{2}=1\) and \(\gamma =1\) and \(\sigma _{a}^{2}=10\). When the idiosyncratic noises are homoscedastic and white noise, according to the results in previous section, the largest eigenvalue of the population covariance matrix of \(\varDelta Y\) is given by \(\lambda _{1}=\sigma _{e}^{2}+\sum _{i=1}^{N}p_{i1}^{2}\) while all other eigenvalues are given by \(\sigma _{e}^{2}\). Note that in the first two cases, \(\sigma _{e}^{2}=1\) and the population eigenvalues are equal. In the two latter cases, \(\sigma _{e}^{2}=10.\) Figure 1 shows that, regardless of the value of \(\sigma _{e}^{2},\) the eigenvalues are better estimated when \(\gamma =1\) than when \(\gamma =-0.8\), with smaller biases and standard deviations. Obviously, given \(\gamma ,\) the eigenvalues are better estimated when \(\sigma _{a}^{2}\) is smaller. Therefore, in order to estimate the eigenvalues of the covariance matrix of \(\varDelta Y,\),important not only the relative variance of the differenced idiosyncratic noises but also their temporal dependence is important.

Fig. 1
figure 1

Eigenvalues of DFM with \(N = 12\), \(T = 100\), \(r = 1\), \(\phi = 1\), with \(\gamma = -0.8\) and \(\sigma _a^2 = 0.1\) (first row), \(\gamma = 1\) and \(\sigma _a^2 = 1\) (second row), \(\gamma = -0.8\) and \(\sigma _a^2 = 1\) (third row) and \(\gamma = 1\) and \(\sigma _a^2 = 10\) (fourth row). The first column plots the eigenvalues while the second and third columns plot their differences and ratios, respectively. The population eigenvalues are plotted in red, the Monte Carlo averages in black and the corresponding 95 % intervals in blue. (Color figure online)

Fig. 2
figure 2

Eigenvalues of DFM with \(r = 1\), \(\phi = 1\) and \(\sigma _{\eta }^2 = 1\) when the idiosyncratic noises are AR(1) process with \(\gamma = -0.8\) and \(\sigma _a^2 = 1\). The first column plots the eigenvalues while the second and third column plot their differences and ratios respectively. The population eigenvalues are plotted in red, the Monte Carlo averages in black and the corresponding 95 % intervals in blue. First row \(N = 12, T = 100\); second row \(N = 12, T = 500\); third row \(N = 50, T = 100\); fourth row \(N = 50, T = 500\); fifth row \(N = 200, T = 100\) and sixth row \(N = 200, T = 500\). (Color figure online)

Fig. 3
figure 3

Percentage of \(\hat{r} = r_{\max }\) (blue), \(r_{\max }> \hat{r} > r\) (green), \(\hat{r} = 1\) (red) and \(\hat{r} = 0\) (black) in a DFM with \(r = 1\), \(\phi = 1\), \(\sigma _{\eta }^2 = 1\), \(\gamma = -0.8\) and \(\sigma _a^2 = 0.1\). First row \(N = 12, T = 100\); second row \(N = 12, T = 500\); third row \(N = 50, T = 100\); fourth row \(N = 50, T = 500\); fifth row \(N = 200, T = 100\) and sixth row \(N = 200, T = 500\). The first column has homoscedastic and uncorrelated idiosyncratic noise; the second column the noises are heteroscedastic while in the third column they are cross-sectionally correlated. (Color figure online)

Fig. 4
figure 4

Percentage of \(\hat{r} = r_{\max }\) (blue), \(r_{\max }> \hat{r} > r\) (green), \(\hat{r} = 1\) (red) and \(\hat{r} = 0\) (black) in a DFM with \(r = 1\), \(\phi = 1\), \(\sigma _{\eta }^2 = 1\), \(\gamma = 1\) and \(\sigma _a^2 = 1\). First row \(N = 12, T = 100\); second row \(N = 12, T = 500\); third row \(N = 50, T = 100\); fourth row \(N = 50, T = 500\); fifth row \(N = 200, T = 100\) and sixth row \(N = 200, T = 500\). The first column has homoscedastic and uncorrelated idiosyncratic noise; the second column the noises are heteroscedastic, while in the third column they are cross-sectionally correlated. (Color figure online)

Fig. 5
figure 5

Percentage of \(\hat{r} = r_{\max }\) (blue), \(r_{\max }> \hat{r} > r\) (green), \(\hat{r} = 1\) (red) and \(\hat{r} = 0\) (black) in a DFM with \(r = 1\), \(\phi = 1\), \(\sigma _{\eta }^2 = 1\), \(\gamma = -0.8\) and \(\sigma _a^2 = 1\). First row \(N = 12, T = 100\); second row \(N = 12, T = 500\); third row \(N = 50, T = 100\); fourth row \(N = 50, T = 500\); fifth row \(N = 200, T = 100\) and sixth row \(N = 200, T = 500\). The first column has homoscedastic and uncorrelated idiosyncratic noise; the second column the noises are heteroscedastic while in the third column they are cross-sectionally correlated. (Color figure online)

In order to analyze the separate effect of the cross-sectional and temporal dimensions of the system on the estimation of the eigenvalues, Fig. 2 plots the same quantities as in Fig. 1 for \(\gamma =-0.8\) and \(\sigma _{a}^{2}=1,\) when \(N=12, 50\) and 200, and \(T=100\) and 500. Note that when N increases, the first eigenvalue of the population covariance matrix is different and is estimated with larger biases and standard deviations. All other eigenvalues are also estimated with larger biases and standard deviations. Therefore, given T, increasing N could lead to an even worse estimation of the sample eigenvalues. However, as expected, given N, an increase in T leads to smaller biases and standard deviations of the estimated eigenvalues.

Fig. 6
figure 6

Percentage of \(\hat{r} = r_{\max }\) (blue), \(r_{\max }> \hat{r} > r\) (green), \(\hat{r} = 2\) (red), \(\hat{r} = 1\) (gold) and \(\hat{r} = 0\) (black) in a DFM with \( r = 2\), \(\gamma = -0.8\) and \(\sigma _a^2 = 0.1\). System dimensions \(N = 12\), \(T = 100\) (first column); \(N = 200\), \(T = 500\) (second column). The factors are two random walks with variance \(\sigma _{\eta }^2 = 1\) (first row); two random walks with variances \(\sigma _{\eta _1}^2 = 1\) and \(\sigma _{\eta _2}^2 = 5\) (second row) and a random walk with variance \(\sigma _{\eta _1} ^2 = 1\) and a stationary factor with \(\sigma _{\eta _2}^2 = 1\). (Color figure online)

Fig. 7
figure 7

PC estimated factors (first row) and corresponding factor weights (second row) obtained assuming \(r=3\) and using original inflation rates (first column) and differenced rates (second column)

The finite sample properties of the estimated eigenvalues have effects on the properties of the procedures to detect the number of factors. Figure 3 plots, for each of the procedures considered, the percentage of replicates in which the estimated number of common factors is: (i) \(\hat{r}=0\); (ii) \(\hat{r}=r;\) (iii) \(\hat{r}=r_{\max }\); and (iv) \(\hat{r}>r\), when \(\gamma =-0.8\) and \(\sigma _{a}^{2}=0.1\) (\(\sigma _{e}^{2}=1),\) when \(N=12, 50\) and 200 and \(T=100\) and 500. We consider idiosyncratic noises being homoscedastic and uncorrelated; heteroscedastic and uncorrelated; and homoscedastic and cross-sectionally correlated. We can observe that, regardless of the structure of the idiosyncratic noises and the cross-sectional dimension, when \(T=100,\) the three information estimators tend to overestimate r and in most of the replicates \(\hat{r}_{\textit{IC}}=r_{\max }.\) However, when \(T=500\), the percentage of \(\hat{r}_{\textit{IC}}=r\) is close to 100 % if the idiosyncratic errors are homoscedastic and cross-sectionally uncorrelated even if \(N=12\). However, if there is cross-sectional correlation \(\hat{r}_{\textit{IC}}=r_{\max }.\) On the other hand, increasing N leads to a larger percentage of \(\hat{r}_{\textit{IC}}>r.\) The performance of the two estimators based on ratios of eigenvalues, \(\hat{r}_{\textit{ER}}\) and \(\hat{r}_{GR},\) is very similar and always better than that of the estimator based on differenced eigenvalues, \(\hat{r}_{\textit{ED}}\). The percentages of correct estimation of r when implementing the \(\hat{r}_{\textit{ER}}\) and \(\hat{r}_{GR}\) estimators are close to 90 % when \(N=12\) and \(T=100\) and increase to 100 % when increasing either N or T. The results for heteroscedastic and cross-correlated idiosyncratic noises are very similar.

Figure 4 plots the same quantities as in Fig. 3 when \(\gamma =1\) and \(\sigma _{a}^{2}=1.\) Note that this case is comparable to that in Fig. 3 in the sense that the variance of the differenced idiosyncratic noises is the same, \(\sigma _{e}^{2}=1,\) but the differentiated idiosyncratic noises are cross-sectionally uncorrelated white noises. We can observe that the performance of the alternative procedures to estimate r is rather different to that in Fig. 3. All procedures have correct estimations close to 100 % except the information criteria when \(N=12\) and the idiosyncratic errors are cross-correlated. In this latter case, \(\hat{r}_{\textit{IC}}=r_{\max }.\) Consequently, not only the variance of the differenced idiosyncratic noises but also its dependence structure have effects on the procedures to detect the number of factors. Only the \(\hat{r}_{\textit{ER}}\) and \(\hat{r}_{GR}\) estimators seem to be robust to them.

Finally, Fig. 5 considers the case when \(\gamma =-0.8\) and \(\sigma _{a}^{2}=1\) with \(\sigma _{e}^{2}=10.\) In this case, the information criteria behave very similarly than when \(\sigma _{a}^{2}=0.1\) and \(T=100\) with \(\hat{r}_{\textit{IC}}=r_{\max }.\) However, when \(N= (12, 50)\) and \(T=500\), the information criteria procedures estimate \(\hat{r}_{\textit{IC}}=0\). Therefore, it seems that they are more affected by the temporal dependence of the differenced idiosyncratic noises than by their variance. On the other hand, when looking at the performance of \(\hat{r}_{\textit{ER}}\) and \(\hat{r}_{GR}\), we can observe that it clearly deteriorates when \(\sigma _{e}^{2}=10.\) Therefore, their performance clearly depends on \(\sigma _{e}^{2}.\) The behavior of \(\hat{r}_{\textit{ED}}\) depends both on \(\gamma \) and \(\sigma _{e}^{2} \) with a rather large percentage of cases in which \(\hat{r}_{\textit{IC}}=0.\)

In the second part of the Monte Carlo experiments, we consider models in which \(r=2.\) First, we consider a second non-stationary common factor, i.e., \(\varPhi =I\) and \(\varSigma _{\eta }=I\). Second, the covariance matrix of the factor disturbances is given by \(\varSigma _{\eta } = \text {diag}(1, 5)\). Finally, the last model considered has a second stationary factor with \(\varSigma _{\eta }=I\) and \(\varPhi = \text {diag}(1, 0.5)\).

For each of the three Data Generating Process (DGP) above, Fig. 6 plots the percentages of (i) \(\hat{r}=0\); (ii) \(\hat{r}=1;\) (iii) \(\hat{r}=r;\) (iv) \(\hat{r}=r_{\max }\); and (v) \(\hat{r}>r\), when \(\gamma =-0.8\) and \(\sigma _{a}^{2}=0.1\) (\(\sigma _{e}^{2}=1)\) and for \(N=12\) with \(T=100\) and \(N=200\) with \(T=500\) Footnote 8. First of all, observe that when \(N=12\) and \(T=100,\) the information criteria chose \(\hat{r}=r_{\max }\) in all cases. Increasing the dimensions of the system helps for \(\hat{r}_{IC1}\) and \(\hat{r}_{IC2}\) but not for \(\hat{r}_{IC3}.\) When looking at the ED, ER and GR criteria, we can observe that, regardless of the structure of the two factors, when \(N=200\) and \(T=500\), all of them have percentages of determination of the true number of factors close to 100 %. However, when \(N=12\) and \(T=100\), there is a large percentage of replicates in which \(\hat{r}=1.\) In this case, the ED procedure is better than the two procedures based on ratios. When the two common random walks in the original data have different variances, the ED procedure has an acceptable proportion of cases in which \(\hat{r}=r.\)

5 Empirical analysis

In this section, we implement the procedures considered in this paper to determine the number of common factors in a system of inflation rates in 15 euro area countries, namely, Austria (AUT), Belgium (BEL), Denmark (DEN), Finland (FIN), France (FRA), Germany (GER), Greece (GRE), Ireland (IRL), Italy (ITA), Luxemburg (LUX), Netherlands (NED), Portugal (POR), Spain (SPA), Sweden (SWE) and United Kingdom (UK). Prices, observed monthly from January 1996 to November 2015, \(P_{it},\) have been obtained from the OCDE data baseFootnote 9 and transformed into annual inflation as \(y_{it}=100\times \varDelta _{12}\log (P_{it})\). When needed, the inflation rates have been corrected by outliers using the software developed by the United States Census BureauFootnote 10. Following Stock and Watson (2005), outliers are substituted by the median of the 5 previous observations. Furthermore, the inflation series have been deseasonalized when appropriate.Footnote 11

Then, as in Reis and Watson (2010) and Altissimo et al (2009), we carry out the determination of the number of factors using both the inflation data in levels and after differencing. All procedures are implemented with \(r_{\max }=5.\) Regardless of whether the procedures are implemented using the original or differenced inflation rates, the information criteria estimate \(\hat{r}=5\) and \(\hat{r}_{\textit{ER}}=\hat{r}_{GR}=1\). However, after differencing, the ED procedure detects just one factor while \(\hat{r}_{\textit{ED}} = 3\) in the original inflation series. According to our Monte Carlo experiments, if the number of true factors is \(r\ge 2\), then the ED, ER and GR procedures tend to detect \(\hat{r}<r\) when implemented to differentiated data. Therefore, we could expect the true number of factors to be larger than one. Consequently, we extract the factors assuming that \(r=3\) both from the original and differenced inflation series. In the latter case, the extracted factors are reaccumulated as proposed by Bai and Ng (2004). The extracted factors and their corresponding weights are plotted in Fig. 7; compare with the factor extracted by Delle Monache et al (2016) using quarterly inflation for a panel of 12 inflation rates from a sample of EMU countries. In Fig. 7, there are not significant differences between the factors estimated using the original and differenced inflation rates but for the centering of the latter. This result could be expected since the variances of all the idiosyncratic noises are rather small with values between 0.03 and 0.1. Consequently, the differenced idiosyncratic noises are white noises with small variances.

Finally, we should point out that the main difference between extracting factors assuming that \(r = 1\) or \(r = 3\) is the interpretability. Recall that PC consistently estimates the space spanned by the factors. Therefore, assuming that \(r = 3\) we can obtain rotations that are not allowed when assuming that \(r = 1\).

6 Conclusions

Differencing non-stationary cointegrated systems have effects on the properties of factor determination procedures. We show that both the variance and the dependence structure of the differenced idiosyncratic noises are important when measuring these effects. If \(r=1\), the ER and GR procedures work well even in relatively small sizes under all the structures of the idiosyncratic noises considered in this paper. Only when the variance of the differenced idiosyncratic noises is very large with respect to the variance of the differenced factor, the performance is worse although better than the alternatives. However, the performance of all procedures deteriorates when \(r=2\). In this case, the ED procedure seems to work better.