Abstract
Cattel’s (Multivar Behav Res 1:245–276, 1966) heuristic determines the number of factors as the elbow point between ‘steep’ and ‘not steep’ in the scree plot. In contrast, an elbow is by definition absent in points on a hyberbole with corresponding equisized surfaces. We formalize this heuristic and propose a criterion to determine the number of factors by comparing surfaces under the scree plot. Monte Carlo simulations shows that the finite-sample properties of our proposed criterion outperform benchmarks in the dynamic factor model literature.
Avoid common mistakes on your manuscript.
1 Introduction
A widely used method to analyse large quantities of data in the social sciences is factor analysis, in which the variation in a large number of observed variables is described in a few unobserved variables, or ‘common factors’. One of the main issues in factor analysis is the determination of the number of unobserved variables to retain, i.e. the number of factors. Various methods are in use: (i) heuristic methods like the Kaiser-Guttman criterion, e.g. Kaiser (1960); Guttman (1954) in which only factors with eigenvalues greater than one are retained, the scree test of Cattell (1966), or parallel analysis (PA), e.g. Horn (1965); (ii) stopping rules, e.g. Peres-Neto et al. (2005; iii) factor analysis (FA), e.g. Connor and Korajczyk (1993) or principal component analysis, e.g. (Jolliffe 2002, Chapter 6) or Coste et al. (2005).Footnote 1
The scree test of Cattell (1966) is often used to determine the number of factors. It is a graphical technique that consists of plotting the eigenvalues \(\lambda _k\) against its component number k as in Fig. 1,Footnote 2 and deciding at which value of k the slopes of the plotted points are ‘steep’ to the left of k and ‘not steep’ at the right of k. This value of k, which defines an ‘elbow’ in the graph, is then taken to be the number of factors to be retained.
Onatski (2009) formalizes the scree test and proposes to look at the evolution in the fraction of subsequent differences between eigenvalues, \((\gamma _{k}-\gamma _{k+1}) / (\gamma _{k+1}-\gamma _{k+2})\), where \(\gamma _i\) is the i-th largest eigenvalue of the smoothed periodogram estimate. He describes the asymptotic distribution of the statistic as the number of variables N and the number of observations T increase and tabulates the critical values of the test.
This paper proposes an alternative heuristic that is derived from the scree plot and based on diverging eigenvalues. The proposed heuristic is related to the criterion of den Reijer et al. (2022). Both are associated with the scree plot. Whereas den Reijer et al. (2022) apply a threshold-based stopping rule to an approximate static factor model, our heuristic is purely based on diverging eigenvalues and, hence, applied to the more general dynamic factor model specification. Onatski’s (2009) dynamic Monte Carlo simulations reveals good finite-sample performance of our proposed heuristic compared to Bai and Ng (2007); Hallin and Liška (2007) and Onatski (2009).
2 Method
Consider the scaled covariance matrix \(\varvec{X}\varvec{X}'/NT\), where \(\varvec{X}\) is the \((N \times T) - \)matrix with T observations for N time series variables. Then the N-dimensional vector \(\varvec{x}_{t}\) of observations at time \(t=1,...,T\) with zero mean and covariance matrix \( \varvec{\varGamma }\) can be expressed as
with the N-dimensional orthogonal factors \(\tilde{\varvec{F}}_t\) and \(\varvec{B}\) the \(N\times N\) matrix of factor loadings, \(\varvec{B}\varvec{B}'=\varvec{\varGamma }\) with a scaling such that the trace \({\text {tr}}(\varvec{\varGamma })=1\). Based on the singular value decomposition \(\varvec{B}= \varvec{C}\varvec{\varSigma }\varvec{W}'\) with the matrix \(\varvec{\varSigma }= {\text {diag}}(\sigma _{1,N}, \sigma _{2,N}, \ldots , \sigma _{N,N})\) containing the ordered singular valuesFootnote 3\(\sigma _{1,N} \ge \sigma _{2,N}, \ge \ldots \ge \sigma _{N,N}\), the following orthogonal least squares decomposition holds for any \(1 \le k<N\):
and matrix dimensions:
The Euclidean (Schur) norm of \(\varvec{x}_t\), \(||\varvec{x}_t||=\sqrt{\varvec{x}_t'\varvec{x}_t}\) can be written as
with ordered eigenvalues \(\lambda _{i,N}\) being the squared singular values \(\lambda _{i,N} = \sigma _{i,N}^2\), \(i=1, \ldots , N\) and \(\delta _{j,N}(k) \equiv (\lambda _{j,N} - \lambda _{k,N}) \ge 0, \ j=1, \ldots , k\). So, the ‘common’ variance \(||{\varvec{x}}_{k,t}||^2\) has a lower bound \(J_N(k)\) equal to \(J_N(k)\equiv k \lambda _{k,N}\). Moreover, \(J^c_N(k) \equiv \sum _{j=1}^k \delta _{j,N}(k) + \sum _{j=k+1}^N \lambda _{j,N}\) consists of the sum of the remaining ‘common’ variance and the ‘idiosyncratic’ variance.
As the eigenvalues are ordered, \(J_N(k)\) is a trade-off between k and \(\lambda _{k,N} \): if k increases, \(\lambda _{k,N} \) becomes smaller. Define points on the hyperbola \(\{k, \bar{\lambda }_{k,N} \}\) as the case for which this trade-off exactly cancels out for every k and, hence, results in equisized surfaces, i.e. \({\bar{J}}_N(k) \equiv k \bar{\lambda }_{k,N}=c\), \(\forall k\) with constant c. For the eigenvalues corresponding to points on the hyperbola \(\{k, \bar{\lambda }_{k,N}\}\), it then holds that \(c=\bar{\lambda }_{1,N}=k\bar{\lambda }_{k,N}\), \(\forall k\). Moreover, using the unity sum of scaled eigenvalues, it holds that \(1 = \sum _{j=1}^N \bar{\lambda }_{j,N} = \bar{\lambda }_1 \sum _{j=1}^N \frac{1}{j} = \bar{\lambda }_1 H_N\), with harmonic number \(H_N \equiv \sum _{j=1}^N \frac{1}{j}\). This enables to quantify \(\bar{\lambda }_{k,N}=\frac{1}{kH_N}\) and, moreover, \({\bar{J}}_N(k) = k\bar{\lambda }_{k,N} = \frac{1}{H_N}\), \(\forall k\). Note that as \(H_N\) diverges, \(\bar{\lambda }_{k,N}\) converges to zero, so \(\lim _{N\rightarrow \infty } \bar{\lambda }_{k,N}=0\).
The points on the hyperbola \(\{k, \bar{\lambda }_{k,N} \}\) are graphically illustrated in Fig. 1 together with a stylized scree plot \(\{k, \lambda _{k,N} \}\). The figure also shows the surface \(J_N(k)\) for two different values of k. In order to compare subsequent surfaces under the scree plot, let’s define:
Note that by construction \(\overline{DJ}_N(k) \equiv \Delta {\bar{J}}_N(k) = {\bar{J}}_N(k+1)-{\bar{J}}_N(k)=0\), \(\forall k\) as the surfaces corresponding to the points on a hyperbola are by construction equisized.
Now we can formalize the notion of the elbow point \(\kappa \) in the graph that distinguishes the scree plot being ‘steep’ for \(k < \kappa \), while ‘not steep’ for \(k > \kappa \). Compared to the steepness of the points on the hyperbola, \(\Delta \bar{\lambda }_{k,N} = \bar{\lambda }_{k+1,N} - \bar{\lambda }_{k,N} = -\frac{1}{k(k+1)H_N}\), the relative steepness of the scree plot can be formalized as:
From the strict inequality assumption in (4) and the unity sum of scaled eigenvalues \(1 = \sum _{j=1}^N \bar{\lambda }_{j,N} = \sum _{j=1}^N {\lambda }_{j,N}\), it follows
So a factor structure exists as \(\lim _{N\rightarrow \infty } \bar{\lambda }_{k,N} = 0\) and thereby \(\lambda _{k,N}\) converges for \(k > \kappa \). Now, the heuristic scree plot criterion can be derived as:
For \(k<\kappa \), the scree plot formalization (4) implies that \({DJ}_N(k)=k\Delta {\lambda }_{k,N}+{\lambda }_{k+1,N} < -\frac{1}{(k+1)H_N}+{\lambda }_{k+1,N}\). So, \({DJ}_N(k) \le 0\) if and only if \({\lambda }_{k+1,N} \le \bar{\lambda }_{k+1,N}\), which contradicts the factor structure (5). Moreover, as \(\lim _{N\rightarrow \infty } \bar{\lambda }_{k,N}=0\) for \(k>\kappa \), it holds that \(\lim _{N\rightarrow \infty }{DJ}_N(\kappa ) = -\kappa {\lambda }_{\kappa ,N}\) and \(\lim _{N\rightarrow \infty }{DJ}_N(k) = 0\) for \(k>\kappa \).
So, \({DJ}_N(k)\) is positive for \(k<\kappa \), with \(\lim _{N\rightarrow \infty }{DJ}_N(\kappa ) = - \kappa {\lambda }_{\kappa ,N} <0\) and zero for \(k>\kappa \), which suggests \(\kappa = \arg \min {DJ}_N(k),\ k=1,2,\ldots \). The simulations in the next section employ the estimator \({\hat{k}} = \arg \min {DJ}_N(k)\).
3 Monte Carlo experiments
To assess the finite-sample properties of our heuristic, we compare it to the methods of Bai and Ng (2007) (BN henceforth), Hallin and Liška (2007) (HL henceforth) and Onatski (2009), three methods frequently used to calculate the number of dynamic factors.Footnote 4 We consider the generalized dynamic factor structure
where \(\Lambda _{i1}\left( L\right) =\sum \nolimits _{i=0}^{\infty }\Lambda _{ij}^{\left( u\right) }L^{u}\) with lag operator L, factor loadings \(\Lambda _{ij}^{\left( u\right) }\), factors \(F_{jt}\) and idiosyncratic term \(e_{it}\).
We replicate Onatski’s modification of Hallin and Liška’s (2007) Monte Carlo experiment and generate data from model (7) as follows:
-
1.
The k-dimensional factor vectors \(F_{jt}\) are i.i.d. \(N(0,I_{k}).\)
-
2.
The filters \(\Lambda _{ik}\left( L\right) ,\) \((i=1,...,n;\) \(k=1,...,q)\) are randomly generated independently from the \(F_{jt}\)’s by one of the following two devices :
- MA loadings::
-
\(\Lambda _{ik}\left( L\right) =b_{ij}^{\left( 0\right) }\left( 1+b_{ij}^{\left( 1\right) }L\right) \left( 1+b_{ij}^{\left( 2\right) }L\right) \) with i.i.d. and mutually independent coefficients \(b_{ij}^{\left( 0\right) }\sim N\left( 0,1\right) ,\) \(b_{ij}^{\left( 1\right) }\sim U\left[ 0,1\right] \) and \(b_{ij}^{\left( 2\right) }\sim U\left[ 0,1\right] \);
- AR loadings::
-
\(\Lambda _{ik}\left( L\right) =b_{ij}^{\left( 0\right) }\left( 1-b_{ij}^{\left( 1\right) }L\right) ^{-1}\left( 1-b_{ij}^{\left( 2\right) }L\right) ^{-1}\) with i.i.d. and mutually independent coefficients \( b_{ij}^{\left( 0\right) }\sim N\left( 0,1\right) ,\) \(b_{ij}^{\left( 1\right) }\sim U\left[ .8,.9\right] \) and \(b_{ij}^{\left( 2\right) }\sim U\left[ .5,.6\right] \).
-
3.
The idiosyncratic components \(e_{it}\) follow \(AR\left( 1\right) \)-processes both cross-sectionally and over time: \(e_{it}=\rho _{i}e_{it-1}+v_{it}\) and \(v_{it}=\rho v_{i-1t}+u_{it,}\) with i.i.d coefficients \(\rho _{i} \sim U\left[ -.8,.8\right] \) , \(\rho =0.2\) and \(u_{it} \sim N\left( 0,1\right) \) i.i.d. and independently generated from \(\Lambda _{ik}\left( L\right) \) and \(F_{jt}\), cf. Onatski (2009). The support \(\left[ -.8,.8\right] \) of the uniform distribution has been chosen to match the range of the first-order autocorrelations of the estimated idiosyncratic components of the Stock and Watson (2005) dataset.
-
4.
For each i, the variance of \(e_{it}\) and that of the common components \(\sum \nolimits _{j=1}^{k}\Lambda _{ij}\left( L\right) F_{jt}\) are normalized such that their variances equal \(0.4+0.05k\) and \(1-(0.4+0.05k),\) respectively. Hence, a \(2-\)factor model explains 50% of the data variation and a \(7-\)factor model 75% for \(\sigma =1\). As a final step, the idiosyncratic part is magnified by \(\sigma \ge 1.\)
Then the different test procedures are employed to determine the number of factors in the simulated data sets. For the Onatski-procedure, the parameter \(\alpha \) equals the maximum of 0.01 and the p-value of the test of \( H_{0}:k=0\) vs. \(H_{1}:0<k\le k_{max}\) with \(k_{max}=4\). So, \(\alpha \) is calibrated such that the test has enough power to reject the false null hypothesis of no factors. Then the algorithm proceeds to test \(H_{0}:k=k_{1}\) vs. \(H_{1}:k_{1}<k\le k_{max}. \) If \(H_{0}\) is not rejected, stop. Otherwise, test \(H_{0}:k=k_{1}+1\) vs. \(H_{1}:k_{1}+1<k\le k_{max}\). Repeat the procedure until \(H_{0}\) is not rejected. The Onatski-test requires the parameter m for grid size of approximating frequencies and is set at \(m=30,40,65\) for \(T=\ 70,120,\) 500, respectively. Denoted in the original notation of the corresponding paper, for the Bai-Ng estimator, we use the \(\widehat{D}_{1,k}\) statistic for the residuals of a VAR\(\left( 4\right) ,\) set the maximum number of static factorsFootnote 5 at 10 and consider \(\delta =0.1\) and \(m=2\). For the Hallin-Liška estimator, we use the information criterion \(IC_{2;n}^{T}\) with penalty \(p_{1}\left( n,T\right) ,\) set the truncation parameter \(M_{T}\) at \(\left[ 0.7\sqrt{T} \right] \) and consider subsample sizes \(\left( n_{j},T_{j}\right) =\left( n-10j,T-10j\right) \) with \(j=0,1,...,3.\) We chose the penalty multiplier c on a grid 0.01 : 0.01 : 3 using Hallin-Liška’s second “stability interval” procedure.Footnote 6 Finally, we note that our proposed procedure does not require auxiliary parameters and is therefore straightforward to implement.
Table 1 reports the percentages of 500 simulation that deliver 1, 2, 3 and 4 estimated number of factor \({\hat{k}}\) for Onatski’s (2009, Table IV) choices of n, T and \(\sigma ^{2}\). Compared to Onatski’s (2009) reported results, some minor differences occur. The Bai-Ng application in case \(\sigma ^{2}=1\) for AR-loadings shows better results in our application, while Onatski obtains better results for the Hallin-Liška-estimator. The table shows that our criterion procedure clearly outperforms the other procedures.Footnote 7
Table 2 reports the results of the extended simulation analysis with the true number of factors being \(k=7,\) an extended \(\left( n,T\right) -\)grid and estimators being constrained to lie in the range from 1 to 14.
Three general observations emerge from the tables: (i) all procedures have a tendency to underestimate rather than overestimate the true number of factors, which becomes more evident with an increasing number of factors; (ii) the Bai-Ng and the Hallin-Liška estimators do not capture the true number of factors in small samples, but even in the large dimension case \(\left( n,T\right) =\left( 150,500\right) \), if the data is noisy, i.e. for \(\sigma ^{2}=16;\) (iii) although the Onatski-procedure is also based on the scree test, our proposed procedure shows generally a better performance. Our procedure does not involve auxiliary calculations related to the spectral decomposition, which might explain its relative efficiency.
4 Conclusion
This paper presents an heuristic to determine the number of factors based on the comparison of surfaces under the scree plot. Our heuristic is simple to implement and does neither require the specification of several auxiliary parameters as in Bai and Ng (2007), nor the specification of an automated search procedure as in Hallin and Liška (2007). Our procedure is closely related to Onatski (2009), but is more straightforward as it does not involve cumbersome numerical transformations. Replicating Onatski’s (2009) dynamic factor Monte Carlo simulations shows that our proposed heuristic scree plot criterion is outperforming these benchmarks in the literature.
Notes
Auerswald and Moshagen (2019) show that PA on PCA eigenvalues works even better than FA eigenvalues.
Note that the screeplot is only defined on the discrete eigenvalue ordinals k, but for illustrative purposes lines are drawn between the dots.
In case \(N>T\), then \(\sigma _{i,N}=0\) for \(i>T\). Without loss of generality, we assume \(N \le T\) for ease of notation.
We thank Alexei Onatski and Roman Liška for making available their Matlab programs. The Bai-Ng programs are dowloaded from Serena Ng’s homepage.
In case the algorithm does not produce a second “stability interval”, we refine the increments of the grid to 0.001 instead. If the algorithm still fails to produce a second “stability interval”, then the algorithm determines the number that prevails just after the end of the original first “stability interval”.
The only two exceptions are \(n=70,\) \(T=70\), \(\sigma ^{2}=1\) and \(n=100,\) \(T=120,\) \(\sigma ^{2}=1\) with AR-loadings.
References
Auerswald D, Moshagen M (2019) How to determine the number of factors to retain in exploratory factor analysis. a comparison of extraction methods under realistic conditions. Psychol Methods 24:468–491
Bai J, Ng S (2007) Determining the number of primitive shocks in factor models. J Bus Econ Stat 25:52–60
Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 1:245–276
Connor G, Korajczyk R (1993) A test for the number of factors in an approximate factor model. J Financ 58:1263–1291
Coste J, Bouée S, Ecosse E, Leplège A, Pouchot J (2005) Methodological issues in determining the dimensionality of composite health measures using principal component analysis: case illustration and suggestions for practice. Qual Life Res 14:641–654
den Reijer AHJ, Jacobs JPAM, Otter PW (2022) A criterion for the number of factors. Commun Stat-Theory Methods 50:4293–4299
Guttman L (1954) Some necessary conditions for common factor analysis. Psychometrika 19:149–161
Hallin M, Liška R (2007) Determining the number of factors in the general dynamic factor model. J Am Stat Assoc 102:603–617
Horn JL (1965) A rationale and test for the number of factors in factor analysis. Psychometrika 30:179–185
Jolliffe IT (2002) Principal component analysis. Springer Series in Statistics, 2nd edn. Springer, New York
Kaiser HF (1960) The application of electronic computers to factor analysis. Educ Psychol Meas 20:141–151
Onatski A (2009) Testing hypotheses about the number of factors in large factor models. Econometrica 77:1447–1479
Peres-Neto PR, Jackson DA, Somers KM (2005) How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Comput Stat Data Anal 49:974–997
Stock JH, Watson MW (2005) Implications of dynamic factor models for VAR analysis. Working paper 11467, National Bureau of Economic Research
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The opinions expressed in this article are the sole responsibility of the authors and should not be interpreted as reflecting the views of Sveriges Riksbank. The research is not funded. None of the authors have competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Reijer, A.H.J.d., Otter, P.W. & Jacobs, J.P.A.M. An heuristic scree plot criterion for the number of factors. Stat Papers 65, 3991–4000 (2024). https://doi.org/10.1007/s00362-023-01517-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-023-01517-x