Skip to main content
Log in

Parametric Bootstrap Tests for Determining the Number of Principal Components

  • Published:
Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Abstract

Principal component analysis is a multivariate technique widely used in dimensionality reduction. The ideal number of principal components retained should be defined when one is dealing with high-dimensional data. Some criteria for this choice were proposed in the literature. Most of them have serious limitations, such as normality assumptions, subjective analysis, and asymptotic properties. This study aims to propose two new tests using the parametric bootstrap for determining the optimal number of principal components (PC) retained for subsequent analysis, based on the amount of the total variation accounted for by the k first principal components. The performances of these tests were compared among themselves and with those of Fujikoshi (1980) and Gebert and Ferreira (2010) through Monte Carlo simulations. Under multivariate normality the two proposed parametric bootstrap tests are recommended. Under nonnormality the test of Gebert and Ferreira (2010) is recommended. The three bootstrap tests surpass the Fujikoshi test in most circumstances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amorim, I. S., E. B. Ferreira, R. R. Lima, and R. G. F. A. Pereira. 2010. Monte Carlo based test for inferring about the unidimensionality of a brazilian coffee sensory panel. Food Qual. Pref. Barking, 21(3), 319–323.

    Article  Google Scholar 

  • Chernick, M. R. 2008. Bootstrap methods: A guide for practitioners and researchers, 2nd ed. New York, NY: Wiley-Interscience.

    MATH  Google Scholar 

  • Cirillo, M. A., and D. F. Ferreira. 2003. Extensão do teste para normalidade univariado baseado no coeficiente de correlação quantil-quantil para o caso multivariado. Rev. Matem. Estat. Marília, 21(3), 57–75.

    Google Scholar 

  • Davison, A. C., and D. V. Hinkley. 2008. Bootstrap methods and their application. Cambridge, UK: Cambridge University.

    MATH  Google Scholar 

  • Efron, B., and R. J. Tibshirani. 1993. An introduction to the bootstrap. Boca Raton, FL: Chapman & Hall.

    Book  Google Scholar 

  • Ferreira, D. F. 2008. Estatística multivariada. Lavras, Brazil: UFLA.

    Google Scholar 

  • Fleck, M. P. A., and M. C. Bourdel. 1998. Método de simulação e escolha de fatores na análise dos principais componentes. Rev. Saúde Pública, 32(3), 267–272.

    Article  Google Scholar 

  • Fujikoshi, Y. 1980. Asymptotic expansions for the distributions of the sample roots under nonnormality. Biometrika, 67(1), 45–51.

    Article  MathSciNet  Google Scholar 

  • Gebert, D. M. P., and D. F. Ferreira. 2010. Proposta de teste bootstrap não-paramétrico de retenção do número de componentes principais. Rev. Bras. Biometria, 28(2), 116–136.

    Google Scholar 

  • Jolliffe, I. T. 2002. Principal components analysis, 2nd ed. New York, NY: Springer Verlag.

    MATH  Google Scholar 

  • Klein, L., and W. Mak. 2005. Initial steps in high-frequency modeling of China. Business Econ., 40, 11–14.

    Article  Google Scholar 

  • Mood, A. M., F. A. Graybill, and D. C. Boes. 1974. Introduction to the theory of statistics, 3rd ed. Singapore: McGraw-Hill.

    MATH  Google Scholar 

  • Perez-Neto, P., D. A. Jackson, and K. M. Somers. 2005. How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Comput. Stat. Data Anal., 49(4), 974–997.

    Article  MathSciNet  Google Scholar 

  • Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. 1992. Numerical recipes in Fortran: The art of scientific computing. Cambridge, UK: Cambridge University.

    MATH  Google Scholar 

  • R Development Core Team. 2009. R. http://www.R-project.org (accessed 20 December 2009).

  • Royston, J. P. 1983b. Some techniques for assessing multivariate normality based on the Shapiro-Wilk W. J. R. Stat. Soc. Ser. C App. Stat. 32(2), 121–133.

    MATH  Google Scholar 

  • Timm, N. H. 2002. Applied multivariate analysis. New York, NY: Springer Verlag.

    MATH  Google Scholar 

  • Zimmermann, C. M., O. M. Guimares, and P. G. Peralta-Zamora. 2008. Avaliação da qualidade do corpo hídrico do rio tibagi na região de Ponta Grossa. Quim. Nova, 31, 1727–1732.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Furtado Ferreira.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gebert, D.M.P., Ferreira, D.F. Parametric Bootstrap Tests for Determining the Number of Principal Components. J Stat Theory Pract 8, 674–691 (2014). https://doi.org/10.1080/15598608.2013.828337

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1080/15598608.2013.828337

AMS Subject Classification

Keywords

Navigation