Robust Principal Component Analysis by Reverse Iterative Linear Programming
- 2 Citations
- 2.4k Downloads
Abstract
Principal Components Analysis (PCA) is a data analysis technique widely used in dimensionality reduction. It extracts a small number of orthonormal vectors that explain most of the variation in a dataset, which are called the Principal Components. Conventional PCA is sensitive to outliers because it is based on the \(L_2\)-norm, so to improve robustness several algorithms based on the \(L_1\)-norm have been introduced in the literature. We present a new algorithm for robust \(L_1\)-norm PCA that computes components iteratively in reverse, using a new heuristic based on Linear Programming. This solution is focused on finding the projection that minimizes the variance of the projected points. It has only one parameter to tune, making it simple to use. On common benchmarks it performs competitively compared to other methods. The data and software related to this paper are available at https://github.com/visentin-insight/L1-PCAhp.
Keywords
Principal components analysis Linear programming L1-norm RobustReferences
- 1.Alfaro, C.A., Aydın, B., Valencia, C.E., Bullitt, E., Ladha, A.: Dimension reduction in principal component analysis for trees. Comput. Stat. Data Anal. 74, 157–179 (2014)MathSciNetCrossRefGoogle Scholar
- 2.Bouhouche, S., Lahreche, M., Moussaoui, A., Bast, J.: Quality monitoring using principal component analysis and fuzzy logic application in continuous casting process 1. Am. J. Appl. Sci. 4(9), 637–644 (2007)CrossRefGoogle Scholar
- 3.Brooks, J.P., Dulá, J.H., Boone, E.L.: A pure L1-norm principal component analysis. Comput. Stat. Data Anal. 61, 83–98 (2013)CrossRefGoogle Scholar
- 4.Carter, J.F., Yates, H.S., Tinggi, U.: Stable isotope and chemical compositions of European and Australasian ciders as a guide to authenticity. J. Agric. Food Chem. 63(3), 975–982 (2015)CrossRefGoogle Scholar
- 5.Choulakian, V.: L1-norm projection pursuit principal component analysis. Comput. Stat. Data Anal. 50(6), 1441–1451 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
- 6.Croux, C., Filzmoser, P., Fritz, H.: Robust sparse principal component analysis. Technometrics 55(2), 202–214 (2013)MathSciNetCrossRefGoogle Scholar
- 7.Croux, C., Ruiz-Gazen, A.: High breakdown estimators for principal components: the projection-pursuit approach revisited. J. Multivar. Anal. 95(1), 206–226 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
- 8.Daudin, J.J., Duby, C., Trecourt, P.: Stability of principal component analysis studied by the bootstrap method. Statistics: J. Theoret. Appl. Stat. 19(2), 241–258 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
- 9.Ding, C., Zhou, D., He, X., Zha, H.: R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 281–288. ACM (2006)Google Scholar
- 10.Hawkins, D.M., Bradu, D., Kass, G.V.: Location of several outliers in multiple-regression data using elemental sets. Technometrics 26(3), 197–208 (1984)MathSciNetCrossRefGoogle Scholar
- 11.Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)CrossRefzbMATHGoogle Scholar
- 12.Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2002)zbMATHGoogle Scholar
- 13.Hill Jr., T.W., Ravindran, A.: On programming with absolute-value functions. J. Optim. Theory Appl. 17(1–2), 181–183 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
- 14.Kaplan, S.: Comment on a precis by Shanno and Weil. Manag. Sci. 17(11), 778–780 (1971)CrossRefGoogle Scholar
- 15.Ke, Q., Kanade, T.: Robust L1-norm factorization in the presence of outliers and missing data by alternative convex programming. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 739–746. IEEE (2005)Google Scholar
- 16.Kwak, N.: Principal component analysis based on L1-norm maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1672–1680 (2008)CrossRefGoogle Scholar
- 17.Kwak, N.: Principal component analysis by-norm maximization. IEEE Trans. Cybern. 44(5), 594–609 (2014)MathSciNetCrossRefGoogle Scholar
- 18.Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
- 19.Luenberger, D.G.: Optimization by Vector Space Methods. Wiley, New York (1997)zbMATHGoogle Scholar
- 20.Malagón-Borja, L., Fuentes, O.: Object detection using image reconstruction with PCA. Image Vis. Comput. 27(1), 2–9 (2009)CrossRefGoogle Scholar
- 21.McDonald, G.C., Schwing, R.C.: Instabilities of regression estimates relating air pollution to mortality. Technometrics 15(3), 463–481 (1973)CrossRefGoogle Scholar
- 22.Park, Y.W., Klabjan, D.: Algorithms for L1-norm principal component analysis (2014)Google Scholar
- 23.Rao, M.R.: Technical note - some comments on ‘linear’ programming with absolute-value functionals. Oper. Res. 21(1), 373–374 (1973)CrossRefzbMATHGoogle Scholar
- 24.Ravindran, A., Hill Jr., W.H.: Note - a comment on the use of simplex method forabsolute value problems. Manag. Sci. 19(5), 581–582 (1973)CrossRefzbMATHGoogle Scholar
- 25.Röver, C., Bizouard, M.A., Christensen, N., Dimmelmeier, H., Heng, I.S., Meyer, R.: Bayesian reconstruction of gravitational wave burst signals from simulations of rotating stellar core collapse and bounce. Phys. Rev. D 80(10), 102004 (2009)CrossRefGoogle Scholar
- 26.Shanno, D.F., Weil, R.L.: Technical note - ‘linear’ programming with absolute-value functionals. Oper. Res. 19(1), 120–124 (1971)CrossRefzbMATHGoogle Scholar
- 27.Zhuo, S., Guo, D., Sim, T.: Robust flash deblurring. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2440–2447. IEEE (2010)Google Scholar