Computational Statistics

, Volume 17, Issue 2, pp 251–271 | Cite as

Discarding Variables in a Principal Component Analysis: Algorithms for All-Subsets Comparisons

  • António Pedro Duarte SilvaEmail author


The traditional approach to the interpretation of the results from a Principal Component Analysis implicitly discards variables that are weakly correlated with the most important and/or most interesting Principal Components. Some authors argue that this practice is potentially misleading and that it is preferable to take a variable selection approach, comparing variable subsets according to appropriate approximation criteria. In this paper, we propose algorithms for the comparison of all possible subsets according to some of the most important comparison criteria proposed to date. The computational effort of the proposed algorithms is studied and it is shown that, given current computer technology, they are feasible for problems involving up to thirty variables. A free-domain software implementation can be downloaded from the Internet.


Principal Component Analysis Principal Variables Variable Selection All-Subsets Algorithms 


  1. Beale, E. M. L., Kendall, M. G. & Mann, D. W.(1967), “The Discarding of Variables in Multivariate Analysis”, Biometrika, 54, 357–366.MathSciNetCrossRefGoogle Scholar
  2. Bonifas, L., Escoufier, Y., Gonzalez, P.L. & Sabatier R. (1984), “Choix de Variables en Analyse en Composantes Principales”, Revue de Statistique Appliquée, 32, 2, 5–15.MathSciNetzbMATHGoogle Scholar
  3. Cadima, J.F. & Jolliffe, I.T. (1995), “Loadings and Correlations in the Interpretation of Principal Components”, Journal of Applied Statistics, 22, 2, 203–214.MathSciNetCrossRefGoogle Scholar
  4. Cadima, J.F. & Jolliffe, I.T. (2001), “Variable Selection and the Interpretation of Principal Subspaces”, To appear in Journal of Agricultural, Biological and Environmental Statistics.Google Scholar
  5. Duarte Silva, A.P. (1998), A Leaps and Bounds Algorithm for Variable Selection in Two-Group Discriminant Analysis, in “Advances in Data Science and Classification”, IFCS, Springer, 227–232.CrossRefGoogle Scholar
  6. Duarte Silva, A.P. (2001), “Efficient Variable Secreening for Multivariate Analysis”, Journal of Multivariate Analysis, 76, 1, 35–62.MathSciNetCrossRefGoogle Scholar
  7. Fenneteau, H. & Bialès, C. (1993), Analyse Statistique des Données. Applications e Cas pour le Marketing, Ellipses, Paris.Google Scholar
  8. Furnival, G.M. (1971), “All Possible Regressions with Less Computation”, Technometrics, 13, 403–408.CrossRefGoogle Scholar
  9. Furnival, G.M. & Wilson, R.W. (1974), “Regressions by Leaps and Bounds”, Technometrics, 16, 499–511.CrossRefGoogle Scholar
  10. Jeffers, J.N.R. (1967), “Two Case Studies in the Application of Principal Components Analysis”, Journal of Applied Statistics, 16, 225–236.CrossRefGoogle Scholar
  11. Jolliffe, I.T. (1972), “Discarding Variables in a Principal Component Analysis, I: Artificial Data”, Journal of Applied Statistics, 21, 160–173.MathSciNetCrossRefGoogle Scholar
  12. Jolliffe, I.T. (1973), “Discarding Variables in a Principal Component Analysis, II: Real Data”, Journal of Applied Statistics, 22, 21–31.MathSciNetCrossRefGoogle Scholar
  13. Krzanowski, W.J. (1987), “Selection of Variables to Preserve Multivariate Data Structure using Principal Components”, Applied Statistics, 36, 22–33.CrossRefGoogle Scholar
  14. Lebart, L., Morineau, A. & Piron, M. (1995), Statistique Exploratoire Multidimensionelle, Dunod, Paris.zbMATHGoogle Scholar
  15. Lawless, J. & Singhai, K. (1978), “Efficient Screening of Nonnormal Regression Models” Biometrics, 34, 318–327.CrossRefGoogle Scholar
  16. McCabe, G.P. (1975), “Computations for Variable Selection in Discriminant Analysis”, Technometrics, 17, 103–109.CrossRefGoogle Scholar
  17. McCabe, G.P. (1984), “Principal Variables”, Technometrics, 26, 2, 137–144.MathSciNetCrossRefGoogle Scholar
  18. Minhoto, M.J. (1998), A Redução de Dimensionalidade Através de Subconjuntos de Variáveis Observadas, Unpublished Master Thesis. Universidade Técnica de Lisboa. Instituto Superior de Agronomia, Lisboa.Google Scholar
  19. Morrison, D.F. (1990), Multivariate Statistical Methods, 3rd ed., McGraw-Hill, New York.Google Scholar
  20. Ramsay, J.O., Berge, J. & Styan, G.P.H. (1984), “Matrix Correlation”, Psychometrika, 49, 3, 403–423.MathSciNetCrossRefGoogle Scholar
  21. Robert, P. & Escoufier, Y. (1976), “A Unifying Tool for Linear Multivariate Statistical Methods: The RV-Coefficient”, Applied Statistics, 25, 3, 257–265.MathSciNetCrossRefGoogle Scholar

Copyright information

© Physica-Verlag 2002

Authors and Affiliations

  1. 1.Faculdade de Economia e GestãoUniversidade Católica Portuguesa at PortoPortoPortugal

Personalised recommendations