1 Introduction

A widely used method to analyse large quantities of data in the social sciences is factor analysis, in which the variation in a large number of observed variables is described in a few unobserved variables, or ‘common factors’. One of the main issues in factor analysis is the determination of the number of unobserved variables to retain, i.e. the number of factors. Various methods are in use: (i) heuristic methods like the Kaiser-Guttman criterion, e.g. Kaiser (1960); Guttman (1954) in which only factors with eigenvalues greater than one are retained, the scree test of Cattell (1966), or parallel analysis (PA), e.g. Horn (1965); (ii) stopping rules, e.g. Peres-Neto et al. (2005; iii) factor analysis (FA), e.g. Connor and Korajczyk (1993) or principal component analysis, e.g. (Jolliffe 2002, Chapter 6) or Coste et al. (2005).Footnote 1

The scree test of Cattell (1966) is often used to determine the number of factors. It is a graphical technique that consists of plotting the eigenvalues \(\lambda _k\) against its component number k as in Fig. 1,Footnote 2 and deciding at which value of k the slopes of the plotted points are ‘steep’ to the left of k and ‘not steep’ at the right of k. This value of k, which defines an ‘elbow’ in the graph, is then taken to be the number of factors to be retained.

Onatski (2009) formalizes the scree test and proposes to look at the evolution in the fraction of subsequent differences between eigenvalues, \((\gamma _{k}-\gamma _{k+1}) / (\gamma _{k+1}-\gamma _{k+2})\), where \(\gamma _i\) is the i-th largest eigenvalue of the smoothed periodogram estimate. He describes the asymptotic distribution of the statistic as the number of variables N and the number of observations T increase and tabulates the critical values of the test.

This paper proposes an alternative heuristic that is derived from the scree plot and based on diverging eigenvalues. The proposed heuristic is related to the criterion of den Reijer et al. (2022). Both are associated with the scree plot. Whereas den Reijer et al. (2022) apply a threshold-based stopping rule to an approximate static factor model, our heuristic is purely based on diverging eigenvalues and, hence, applied to the more general dynamic factor model specification. Onatski’s (2009) dynamic Monte Carlo simulations reveals good finite-sample performance of our proposed heuristic compared to Bai and Ng (2007); Hallin and Liška (2007) and Onatski (2009).

2 Method

Consider the scaled covariance matrix \(\varvec{X}\varvec{X}'/NT\), where \(\varvec{X}\) is the \((N \times T) - \)matrix with T observations for N time series variables. Then the N-dimensional vector \(\varvec{x}_{t}\) of observations at time \(t=1,...,T\) with zero mean and covariance matrix \( \varvec{\varGamma }\) can be expressed as

$$\begin{aligned} \varvec{x}_t=\varvec{B}\tilde{\varvec{F}}_t = \varvec{C}\varvec{\varSigma }\varvec{W}'\tilde{\varvec{F}}_t \equiv \varvec{C}\varvec{\varSigma }\varvec{F}_t, \end{aligned}$$
(1)

with the N-dimensional orthogonal factors \(\tilde{\varvec{F}}_t\) and \(\varvec{B}\) the \(N\times N\) matrix of factor loadings, \(\varvec{B}\varvec{B}'=\varvec{\varGamma }\) with a scaling such that the trace \({\text {tr}}(\varvec{\varGamma })=1\). Based on the singular value decomposition \(\varvec{B}= \varvec{C}\varvec{\varSigma }\varvec{W}'\) with the matrix \(\varvec{\varSigma }= {\text {diag}}(\sigma _{1,N}, \sigma _{2,N}, \ldots , \sigma _{N,N})\) containing the ordered singular valuesFootnote 3\(\sigma _{1,N} \ge \sigma _{2,N}, \ge \ldots \ge \sigma _{N,N}\), the following orthogonal least squares decomposition holds for any \(1 \le k<N\):

$$\begin{aligned} \varvec{x}_t = \varvec{C}\varvec{\varSigma }\varvec{W}'\tilde{\varvec{F}}_t = \varvec{C}_k \varvec{\varSigma }_k \varvec{F}_{k,t} + \varvec{C}_{N-k} \varvec{\varSigma }_{N-k} \varvec{F}_{N-k,t} \equiv \tilde{\varvec{x}}_{k,t} + \tilde{\varvec{x}}_{N-k,t}, \end{aligned}$$
(2)

and matrix dimensions:

$$\begin{aligned} \varvec{C}&= \begin{bmatrix} \varvec{C}_k&\varvec{C}_{N-k} \end{bmatrix}, \quad \varvec{C}_k \in \mathbb {R}^{N \times k}, \varvec{C}_{N-k} \in \mathbb {R}^{N \times (N-k)}, \\ \varvec{\varSigma }&= \begin{bmatrix} \varvec{\varSigma }_{k} &{} \varvec{0} \\ \varvec{0} &{} \varvec{\varSigma }_{N-k} \end{bmatrix}, \quad \varvec{\varSigma }_{k} = {\text {diag}}(\sigma _{1,N}, \ldots , \sigma _{k,N}), \ \varvec{\varSigma }_{N-k} = {\text {diag}}(\sigma _{k+1,N}, \ldots , \sigma _{N,N}), \\ \varvec{F}_t&= \begin{bmatrix} \varvec{F}'_{k,t}&\varvec{F}'_{N-k,t} \end{bmatrix}', \quad \dim (\varvec{F}'_{k,t})=k, \ \dim (\varvec{F}'_{N-k,t})=N-k. \end{aligned}$$

The Euclidean (Schur) norm of \(\varvec{x}_t\), \(||\varvec{x}_t||=\sqrt{\varvec{x}_t'\varvec{x}_t}\) can be written as

$$\begin{aligned} ||\varvec{x}_t||^2&= \underbrace{||{\varvec{x}}_{k,t}||^2}_{\text {`common' variance}} + \underbrace{ ||{\varvec{x}}_{N-k,t}||^2}_{\text {`idiosyncratic' variance}} = \sum _{j=1}^k \lambda _{j,N} + \sum _{j=k+1}^N \lambda _{j,N} \\ {}&\equiv k \lambda _{k,N} + \sum _{j=1}^k \delta _{j,N}(k) + \sum _{j=k+1}^N \lambda _{j,N} = J_N(k) + J_N^c(k) = 1, \end{aligned}$$

with ordered eigenvalues \(\lambda _{i,N}\) being the squared singular values \(\lambda _{i,N} = \sigma _{i,N}^2\), \(i=1, \ldots , N\) and \(\delta _{j,N}(k) \equiv (\lambda _{j,N} - \lambda _{k,N}) \ge 0, \ j=1, \ldots , k\). So, the ‘common’ variance \(||{\varvec{x}}_{k,t}||^2\) has a lower bound \(J_N(k)\) equal to \(J_N(k)\equiv k \lambda _{k,N}\). Moreover, \(J^c_N(k) \equiv \sum _{j=1}^k \delta _{j,N}(k) + \sum _{j=k+1}^N \lambda _{j,N}\) consists of the sum of the remaining ‘common’ variance and the ‘idiosyncratic’ variance.

As the eigenvalues are ordered, \(J_N(k)\) is a trade-off between k and \(\lambda _{k,N} \): if k increases, \(\lambda _{k,N} \) becomes smaller. Define points on the hyperbola \(\{k, \bar{\lambda }_{k,N} \}\) as the case for which this trade-off exactly cancels out for every k and, hence, results in equisized surfaces, i.e. \({\bar{J}}_N(k) \equiv k \bar{\lambda }_{k,N}=c\), \(\forall k\) with constant c. For the eigenvalues corresponding to points on the hyperbola \(\{k, \bar{\lambda }_{k,N}\}\), it then holds that \(c=\bar{\lambda }_{1,N}=k\bar{\lambda }_{k,N}\), \(\forall k\). Moreover, using the unity sum of scaled eigenvalues, it holds that \(1 = \sum _{j=1}^N \bar{\lambda }_{j,N} = \bar{\lambda }_1 \sum _{j=1}^N \frac{1}{j} = \bar{\lambda }_1 H_N\), with harmonic number \(H_N \equiv \sum _{j=1}^N \frac{1}{j}\). This enables to quantify \(\bar{\lambda }_{k,N}=\frac{1}{kH_N}\) and, moreover, \({\bar{J}}_N(k) = k\bar{\lambda }_{k,N} = \frac{1}{H_N}\), \(\forall k\). Note that as \(H_N\) diverges, \(\bar{\lambda }_{k,N}\) converges to zero, so \(\lim _{N\rightarrow \infty } \bar{\lambda }_{k,N}=0\).

The points on the hyperbola \(\{k, \bar{\lambda }_{k,N} \}\) are graphically illustrated in Fig. 1 together with a stylized scree plot \(\{k, \lambda _{k,N} \}\). The figure also shows the surface \(J_N(k)\) for two different values of k. In order to compare subsequent surfaces under the scree plot, let’s define:

$$\begin{aligned} {DJ}_N(k) \equiv \Delta J_N(k) = {J}_N(k+1)-{J}_N(k) = (k+1)\lambda _{k+1,N} - k\lambda _{k,N}. \end{aligned}$$
(3)

Note that by construction \(\overline{DJ}_N(k) \equiv \Delta {\bar{J}}_N(k) = {\bar{J}}_N(k+1)-{\bar{J}}_N(k)=0\), \(\forall k\) as the surfaces corresponding to the points on a hyperbola are by construction equisized.

Fig. 1
figure 1

Graphical illustration of a scree plot. Find the k for which the difference between adjacent eigenvalues, i.e. \((k+1) \times \lambda _{k+1,N}\) (blue) - \( k \times \lambda _{k,N}\) (yellow) is minimum. Note that yellow and blue partly overlap in this illustration

Now we can formalize the notion of the elbow point \(\kappa \) in the graph that distinguishes the scree plot being ‘steep’ for \(k < \kappa \), while ‘not steep’ for \(k > \kappa \). Compared to the steepness of the points on the hyperbola, \(\Delta \bar{\lambda }_{k,N} = \bar{\lambda }_{k+1,N} - \bar{\lambda }_{k,N} = -\frac{1}{k(k+1)H_N}\), the relative steepness of the scree plot can be formalized as:

$$\begin{aligned} \Delta {\lambda _{k,N}}&< \Delta \bar{\lambda }_{k,N} = -\frac{1}{k(k+1)H_N}, \ k=1,2, \ldots , \kappa \nonumber \\ \Delta {\lambda _{k,N}}&\ge \Delta \bar{\lambda }_{k,N} = -\frac{1}{k(k+1)H_N}, \ k=\kappa +1, \ldots , N. \end{aligned}$$
(4)

From the strict inequality assumption in (4) and the unity sum of scaled eigenvalues \(1 = \sum _{j=1}^N \bar{\lambda }_{j,N} = \sum _{j=1}^N {\lambda }_{j,N}\), it follows

$$\begin{aligned} \lambda _{k,N}&> {\bar{\lambda }}_{k,N}, \ k=1,2, \ldots , \kappa \nonumber \\ \lambda _{k,N}&\le {\bar{\lambda }}_{k,N}, \ k=\kappa +1, \ldots , N. \end{aligned}$$
(5)

So a factor structure exists as \(\lim _{N\rightarrow \infty } \bar{\lambda }_{k,N} = 0\) and thereby \(\lambda _{k,N}\) converges for \(k > \kappa \). Now, the heuristic scree plot criterion can be derived as:

$$\begin{aligned} \lim _{N\rightarrow \infty } {DJ}_N(k)&> 0, \ k=1,2, \ldots , \kappa -1 \nonumber \\ \lim _{N\rightarrow \infty } {DJ}_N(k)&= -\kappa {\lambda }_{\kappa ,N}, \ k=\kappa \nonumber \\ \lim _{N\rightarrow \infty } {DJ}_N(k)&= 0, \ k=\kappa +1, \ldots , N. \end{aligned}$$
(6)

For \(k<\kappa \), the scree plot formalization (4) implies that \({DJ}_N(k)=k\Delta {\lambda }_{k,N}+{\lambda }_{k+1,N} < -\frac{1}{(k+1)H_N}+{\lambda }_{k+1,N}\). So, \({DJ}_N(k) \le 0\) if and only if \({\lambda }_{k+1,N} \le \bar{\lambda }_{k+1,N}\), which contradicts the factor structure (5). Moreover, as \(\lim _{N\rightarrow \infty } \bar{\lambda }_{k,N}=0\) for \(k>\kappa \), it holds that \(\lim _{N\rightarrow \infty }{DJ}_N(\kappa ) = -\kappa {\lambda }_{\kappa ,N}\) and \(\lim _{N\rightarrow \infty }{DJ}_N(k) = 0\) for \(k>\kappa \).

So, \({DJ}_N(k)\) is positive for \(k<\kappa \), with \(\lim _{N\rightarrow \infty }{DJ}_N(\kappa ) = - \kappa {\lambda }_{\kappa ,N} <0\) and zero for \(k>\kappa \), which suggests \(\kappa = \arg \min {DJ}_N(k),\ k=1,2,\ldots \). The simulations in the next section employ the estimator \({\hat{k}} = \arg \min {DJ}_N(k)\).

3 Monte Carlo experiments

To assess the finite-sample properties of our heuristic, we compare it to the methods of Bai and Ng (2007) (BN henceforth), Hallin and Liška (2007) (HL henceforth) and Onatski (2009), three methods frequently used to calculate the number of dynamic factors.Footnote 4 We consider the generalized dynamic factor structure

$$\begin{aligned} x_{it}=\Lambda _{i1}\left( L\right) F_{1t}+...+\Lambda _{k1}\left( L\right) F_{kt}+e_{it}, \end{aligned}$$
(7)

where \(\Lambda _{i1}\left( L\right) =\sum \nolimits _{i=0}^{\infty }\Lambda _{ij}^{\left( u\right) }L^{u}\) with lag operator L,  factor loadings \(\Lambda _{ij}^{\left( u\right) }\), factors \(F_{jt}\) and idiosyncratic term \(e_{it}\).

We replicate Onatski’s modification of Hallin and Liška’s (2007) Monte Carlo experiment and generate data from model (7) as follows:

  1. 1.

    The k-dimensional factor vectors \(F_{jt}\) are i.i.d. \(N(0,I_{k}).\)

  2. 2.

    The filters \(\Lambda _{ik}\left( L\right) ,\) \((i=1,...,n;\) \(k=1,...,q)\) are randomly generated independently from the \(F_{jt}\)’s by one of the following two devices :

MA loadings::

\(\Lambda _{ik}\left( L\right) =b_{ij}^{\left( 0\right) }\left( 1+b_{ij}^{\left( 1\right) }L\right) \left( 1+b_{ij}^{\left( 2\right) }L\right) \) with i.i.d. and mutually independent coefficients \(b_{ij}^{\left( 0\right) }\sim N\left( 0,1\right) ,\) \(b_{ij}^{\left( 1\right) }\sim U\left[ 0,1\right] \) and \(b_{ij}^{\left( 2\right) }\sim U\left[ 0,1\right] \);

AR loadings::

\(\Lambda _{ik}\left( L\right) =b_{ij}^{\left( 0\right) }\left( 1-b_{ij}^{\left( 1\right) }L\right) ^{-1}\left( 1-b_{ij}^{\left( 2\right) }L\right) ^{-1}\) with i.i.d. and mutually independent coefficients \( b_{ij}^{\left( 0\right) }\sim N\left( 0,1\right) ,\) \(b_{ij}^{\left( 1\right) }\sim U\left[ .8,.9\right] \) and \(b_{ij}^{\left( 2\right) }\sim U\left[ .5,.6\right] \).

  1. 3.

    The idiosyncratic components \(e_{it}\) follow \(AR\left( 1\right) \)-processes both cross-sectionally and over time: \(e_{it}=\rho _{i}e_{it-1}+v_{it}\) and \(v_{it}=\rho v_{i-1t}+u_{it,}\) with i.i.d coefficients \(\rho _{i} \sim U\left[ -.8,.8\right] \)\(\rho =0.2\) and \(u_{it} \sim N\left( 0,1\right) \) i.i.d. and independently generated from \(\Lambda _{ik}\left( L\right) \) and \(F_{jt}\), cf. Onatski (2009). The support \(\left[ -.8,.8\right] \) of the uniform distribution has been chosen to match the range of the first-order autocorrelations of the estimated idiosyncratic components of the Stock and Watson (2005) dataset.

  2. 4.

    For each i, the variance of \(e_{it}\) and that of the common components \(\sum \nolimits _{j=1}^{k}\Lambda _{ij}\left( L\right) F_{jt}\) are normalized such that their variances equal \(0.4+0.05k\) and \(1-(0.4+0.05k),\) respectively. Hence, a \(2-\)factor model explains 50% of the data variation and a \(7-\)factor model 75% for \(\sigma =1\). As a final step, the idiosyncratic part is magnified by \(\sigma \ge 1.\)

Then the different test procedures are employed to determine the number of factors in the simulated data sets. For the Onatski-procedure, the parameter \(\alpha \) equals the maximum of 0.01 and the p-value of the test of \( H_{0}:k=0\) vs. \(H_{1}:0<k\le k_{max}\) with \(k_{max}=4\). So, \(\alpha \) is calibrated such that the test has enough power to reject the false null hypothesis of no factors. Then the algorithm proceeds to test \(H_{0}:k=k_{1}\) vs. \(H_{1}:k_{1}<k\le k_{max}. \) If \(H_{0}\) is not rejected, stop. Otherwise, test \(H_{0}:k=k_{1}+1\) vs. \(H_{1}:k_{1}+1<k\le k_{max}\). Repeat the procedure until \(H_{0}\) is not rejected. The Onatski-test requires the parameter m for grid size of approximating frequencies and is set at \(m=30,40,65\) for \(T=\ 70,120,\) 500, respectively. Denoted in the original notation of the corresponding paper, for the Bai-Ng estimator, we use the \(\widehat{D}_{1,k}\) statistic for the residuals of a VAR\(\left( 4\right) ,\) set the maximum number of static factorsFootnote 5 at 10 and consider \(\delta =0.1\) and \(m=2\). For the Hallin-Liška estimator, we use the information criterion \(IC_{2;n}^{T}\) with penalty \(p_{1}\left( n,T\right) ,\) set the truncation parameter \(M_{T}\) at \(\left[ 0.7\sqrt{T} \right] \) and consider subsample sizes \(\left( n_{j},T_{j}\right) =\left( n-10j,T-10j\right) \) with \(j=0,1,...,3.\) We chose the penalty multiplier c on a grid 0.01 : 0.01 : 3 using Hallin-Liška’s second “stability interval” procedure.Footnote 6 Finally, we note that our proposed procedure does not require auxiliary parameters and is therefore straightforward to implement.

Table 1 reports the percentages of 500 simulation that deliver 1, 2, 3 and 4 estimated number of factor \({\hat{k}}\) for Onatski’s (2009, Table IV) choices of nT and \(\sigma ^{2}\). Compared to Onatski’s (2009) reported results, some minor differences occur. The Bai-Ng application in case \(\sigma ^{2}=1\) for AR-loadings shows better results in our application, while Onatski obtains better results for the Hallin-Liška-estimator. The table shows that our criterion procedure clearly outperforms the other procedures.Footnote 7

Table 1 Monte Carlo replications of the dynamic factor model
Table 2 Monte Carlo replications of the dynamic factor model

Table 2 reports the results of the extended simulation analysis with the true number of factors being \(k=7,\) an extended \(\left( n,T\right) -\)grid and estimators being constrained to lie in the range from 1 to 14.

Three general observations emerge from the tables: (i) all procedures have a tendency to underestimate rather than overestimate the true number of factors, which becomes more evident with an increasing number of factors; (ii) the Bai-Ng and the Hallin-Liška estimators do not capture the true number of factors in small samples, but even in the large dimension case \(\left( n,T\right) =\left( 150,500\right) \), if the data is noisy, i.e. for \(\sigma ^{2}=16;\) (iii) although the Onatski-procedure is also based on the scree test, our proposed procedure shows generally a better performance. Our procedure does not involve auxiliary calculations related to the spectral decomposition, which might explain its relative efficiency.

4 Conclusion

This paper presents an heuristic to determine the number of factors based on the comparison of surfaces under the scree plot. Our heuristic is simple to implement and does neither require the specification of several auxiliary parameters as in Bai and Ng (2007), nor the specification of an automated search procedure as in Hallin and Liška (2007). Our procedure is closely related to Onatski (2009), but is more straightforward as it does not involve cumbersome numerical transformations. Replicating Onatski’s (2009) dynamic factor Monte Carlo simulations shows that our proposed heuristic scree plot criterion is outperforming these benchmarks in the literature.