Abstract
Principal component analysis is a multivariate technique widely used in dimensionality reduction. The ideal number of principal components retained should be defined when one is dealing with high-dimensional data. Some criteria for this choice were proposed in the literature. Most of them have serious limitations, such as normality assumptions, subjective analysis, and asymptotic properties. This study aims to propose two new tests using the parametric bootstrap for determining the optimal number of principal components (PC) retained for subsequent analysis, based on the amount of the total variation accounted for by the k first principal components. The performances of these tests were compared among themselves and with those of Fujikoshi (1980) and Gebert and Ferreira (2010) through Monte Carlo simulations. Under multivariate normality the two proposed parametric bootstrap tests are recommended. Under nonnormality the test of Gebert and Ferreira (2010) is recommended. The three bootstrap tests surpass the Fujikoshi test in most circumstances.
Similar content being viewed by others
References
Amorim, I. S., E. B. Ferreira, R. R. Lima, and R. G. F. A. Pereira. 2010. Monte Carlo based test for inferring about the unidimensionality of a brazilian coffee sensory panel. Food Qual. Pref. Barking, 21(3), 319–323.
Chernick, M. R. 2008. Bootstrap methods: A guide for practitioners and researchers, 2nd ed. New York, NY: Wiley-Interscience.
Cirillo, M. A., and D. F. Ferreira. 2003. Extensão do teste para normalidade univariado baseado no coeficiente de correlação quantil-quantil para o caso multivariado. Rev. Matem. Estat. Marília, 21(3), 57–75.
Davison, A. C., and D. V. Hinkley. 2008. Bootstrap methods and their application. Cambridge, UK: Cambridge University.
Efron, B., and R. J. Tibshirani. 1993. An introduction to the bootstrap. Boca Raton, FL: Chapman & Hall.
Ferreira, D. F. 2008. Estatística multivariada. Lavras, Brazil: UFLA.
Fleck, M. P. A., and M. C. Bourdel. 1998. Método de simulação e escolha de fatores na análise dos principais componentes. Rev. Saúde Pública, 32(3), 267–272.
Fujikoshi, Y. 1980. Asymptotic expansions for the distributions of the sample roots under nonnormality. Biometrika, 67(1), 45–51.
Gebert, D. M. P., and D. F. Ferreira. 2010. Proposta de teste bootstrap não-paramétrico de retenção do número de componentes principais. Rev. Bras. Biometria, 28(2), 116–136.
Jolliffe, I. T. 2002. Principal components analysis, 2nd ed. New York, NY: Springer Verlag.
Klein, L., and W. Mak. 2005. Initial steps in high-frequency modeling of China. Business Econ., 40, 11–14.
Mood, A. M., F. A. Graybill, and D. C. Boes. 1974. Introduction to the theory of statistics, 3rd ed. Singapore: McGraw-Hill.
Perez-Neto, P., D. A. Jackson, and K. M. Somers. 2005. How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Comput. Stat. Data Anal., 49(4), 974–997.
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. 1992. Numerical recipes in Fortran: The art of scientific computing. Cambridge, UK: Cambridge University.
R Development Core Team. 2009. R. http://www.R-project.org (accessed 20 December 2009).
Royston, J. P. 1983b. Some techniques for assessing multivariate normality based on the Shapiro-Wilk W. J. R. Stat. Soc. Ser. C App. Stat. 32(2), 121–133.
Timm, N. H. 2002. Applied multivariate analysis. New York, NY: Springer Verlag.
Zimmermann, C. M., O. M. Guimares, and P. G. Peralta-Zamora. 2008. Avaliação da qualidade do corpo hídrico do rio tibagi na região de Ponta Grossa. Quim. Nova, 31, 1727–1732.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gebert, D.M.P., Ferreira, D.F. Parametric Bootstrap Tests for Determining the Number of Principal Components. J Stat Theory Pract 8, 674–691 (2014). https://doi.org/10.1080/15598608.2013.828337
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1080/15598608.2013.828337