Combining Linear Dimension Reduction Subspaces
- 867 Downloads
Abstract
Dimensionality is a major concern in the analysis of large data sets. There are various well-known dimension reduction methods with different strengths and weaknesses. In practical situations it is difficult to decide which method to use as different methods emphasize different structures in the data. Like ensemble methods in statistical learning, several dimension reduction methods can be combined using an extension of the Crone and Crosby distance, a weighted distance between the subspaces that allows to combine subspaces of different dimensions. Some natural choices of weights are considered in detail. Based on the weighted distance we discuss the concept of averages of subspaces and how to combine various dimension reduction methods. The performance of the weighted distances and the combining approach is illustrated via simulations and a real data example.
Keywords
Weight Function Orthogonal Projection Independent Component Analysis Dimension Reduction Independent Component AnalysisNotes
Acknowledgments
The work of Klaus Nordhausen and Hannu Oja was supported by the Academy of Finland (grant 268703). The authors are grateful to the reviewers for their helpful comments.
References
- Cook RD, Weisberg S (1991) Sliced inverse regression for dimension reduction: comment. J Am Stat Assoc 86:328–332Google Scholar
- Crone LJ, Crosby DS (1995) Statistical applications of a metric on subspaces to satellite meteorology. Technometrics 37:324–328MathSciNetCrossRefzbMATHGoogle Scholar
- Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: the projection-pursuit approach revisited. J Multivar Anal 95:206–226MathSciNetCrossRefzbMATHGoogle Scholar
- Escoufier Y (1973) Le traitement des variables vectorielles. Biometrics 29:751–760MathSciNetGoogle Scholar
- Filzmoser P, Fritz H, Kalcher K (2012) pcaPP: Robust PCA by projection pursuit. R package version 1.9-47Google Scholar
- Friedman JH, Tukey JW (1974) A projection pursuit algorithm for exploratory data analysis. IEEE Trans Comput C 23:881–889CrossRefzbMATHGoogle Scholar
- Halbert K (2011) MMST: Datasets from MMST. R package version 0.6-1.1Google Scholar
- Hettmansperger TP, Randles RH (2002) A practical affine equivariant multivariate median. Biometrika 89:851–860MathSciNetCrossRefzbMATHGoogle Scholar
- Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377CrossRefzbMATHGoogle Scholar
- Hyvärinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw 10:626–634CrossRefGoogle Scholar
- Li KC (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86:316–327MathSciNetCrossRefzbMATHGoogle Scholar
- Li KC (1992) On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J Am Stat Assoc 87:1025–1039CrossRefzbMATHGoogle Scholar
- Liski E, Nordhausen K, Oja H (2014a) Supervised invariant coordinate selection. Stat: A J Theoret Appl Stat 48:711–731MathSciNetCrossRefzbMATHGoogle Scholar
- Liski E, Nordhausen K, Oja H, Ruiz-Gazen A (2014b) LDRTools: tools for linear dimension reduction. R package version 1Google Scholar
- Miettinen J, Nordhausen K, Oja H, Taskinen S (2014) Deflation-based FastICA with adaptive choices of nonlinearities. IEEE Trans Signal Process 62:5716–5724MathSciNetCrossRefGoogle Scholar
- Nordhausen K, Oja H, Tyler DE (2008) Tools for exploring multivariate data: the package ICS. J Stat Soft 28(6):1–31CrossRefGoogle Scholar
- Nordhausen K, Ilmonen P, Mandal A, Oja H, Ollila E (2011) Deflation-based FastICA reloaded. Proceedings of 19th European signal processing conference 2011 (EUSIPCO 2011) 1854–1858Google Scholar
- Nordhausen K, Oja H (2011) Multivariate L1 methods: the package MNM. J Stat Softw 43:1–28CrossRefGoogle Scholar
- Development Core Team R (2012) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
- Rodriguez-Martinez E, Goulermas JY, Mu T, Ralph JF (2010) Automatic induction of projection pursuit indices. IEEE Trans Neural Netw 21:1281–1295CrossRefGoogle Scholar
- Rousseeuw P (1986) Multivariate estimation with high breakdown point. In: Grossman W, Pflug G, Vincze I, Wertz W (eds) Mathematical statistics and applications. Reidel, Dordrecht, pp 283–297Google Scholar
- Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Koller M, Maechler M (2012) Robustbase: basic robust statistics. R package version 0.9-2Google Scholar
- Ruiz-Gazen A, Berro A, Larabi Marie-Sainte S, (2010) Detecting multivariate outliers using projection pursuit with particle swarm optimization. Compstat 2010:89–98Google Scholar
- Shaker AJ, Prendergast LA (2011) Iterative application of dimension reduction methods. Electron J Stat 5:1471–1494MathSciNetCrossRefzbMATHGoogle Scholar
- Tibshirani R (2013) Bootstrap: functions for the book “An introduction to the bootstrap”. R package version 2012.04-1Google Scholar
- Tyler DE (1987) A distribution-free M-estimator of multivariate scatter. Ann Stat 15:234–251MathSciNetCrossRefzbMATHGoogle Scholar
- Tyler DE, Critchley F, Dümbgen L, Oja H (2009) Invariant co-ordinate selection. J Roy Stat Soc 71:549–592MathSciNetCrossRefzbMATHGoogle Scholar
- Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New YorkCrossRefzbMATHGoogle Scholar
- Weisberg S (2002) Dimension reduction regression in R. J Stat Softw 7:1–22CrossRefGoogle Scholar
- Ye Z, Weiss RE (2003) Using the bootstrap to select one of a new class of dimension reduction methods. J Am Stat Assoc 98:968–979MathSciNetCrossRefzbMATHGoogle Scholar
- Zhou ZH (2012) Ensemble methods. CRC Press, Boca Raton, Foundations and AlgorithmsGoogle Scholar