The art of centering without centering for robust principal component analysis

Wan, Guihong; He, Baokun; Schweitzer, Haim

doi:10.1007/s10618-023-00976-y

The art of centering without centering for robust principal component analysis

Published: 09 October 2023

Volume 38, pages 699–724, (2024)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

137 Accesses
2 Altmetric
Explore all metrics

Abstract

Many robust variants of Principal Component Analysis remove outliers from the data and compute the principal components of the remaining data. The robust centered variant requires knowledge of the center of the non-outliers. Unfortunately, the center of non-outliers is unknown until after the outliers are determined, and using an inaccurate center may lead to the detection of wrong outliers. We demonstrate this problem in several known robust PCA algorithms. We describe a method that implicitly centers the non-outliers, implemented by appending a constant value (bias) to each data point. This bias method can be used with “black box” robust PCA algorithms by augmenting their input with minimal change to the algorithm itself.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Notes

There is a sign ambiguity in calculating eigenvectors, since whenever v is an eigenvector with \(\lambda\) eigenvalue then so is \({-}v\). All eigenvectors in Fig. 2 were normalized so that their coordinate sum is positive.

References

Bouwmans T, Sobral A, Javed S, Jung SK, Zahzah E (2017) Decomposition into low-rank plus additive matrices for background/foreground separation: a review for a comparative evaluation with a large-scale dataset. Comput Sci Rev 23:1–71
Article Google Scholar
Burges C (2010) Dimension reduction: a guided tour. Now Publishers Inc., Hanover
Google Scholar
Cadima J, Jolliffe I (2009) On relationships between uncentred and column-centred principal component analysis. Pakistan J Stat 25(4):473–503
MathSciNet Google Scholar
Candes EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis?. J ACM 58(3):11–11137
Article MathSciNet Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Gillis N, Vavasis SA (2018) On the complexity of robust PCA and l1-norm low-rank matrix approximation. Math Oper Res 43(4):1072–1084
Article MathSciNet Google Scholar
Golub GH, Van-Loan CF (2013) Matrix computations, 4th edn
Gray V (2017) Principal component analysis: methods. Appl Technol Math Res Dev
Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
Article MathSciNet Google Scholar
He B, Wan G, Schweitzer H (2020) A bias trick for centered robust principal component analysis (student abstract). In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 13807–13808
Hubert M, Engelen S (2004) Robust PCA and classification in biosciences. Bioinformatics 20(11):1728–1736
Article CAS PubMed Google Scholar
Hughes JF, van Dam A, McGuire M, Sklar DF, Foley JD, Feiner SK, Akeley K (2013) Computer graphics: principles and practice, 3rd edn
Jolliffe IT (2002) Principal component analysis, 2nd edn
Kuang-Chih L, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Article Google Scholar
Lerman G, Maunu T (2018) An overview of robust subspace recovery. Proc IEEE 106(8):1380–1410
Article Google Scholar
Lerman G, Maunu T (2018) Fast, robust and non-convex subspace recovery. Inf Inference J IMA 7(2):277–336
MathSciNet Google Scholar
Li H, Linderman GC, Szlam A, Stanton KP, Kluger Y, Tygert M (2017) Algorithm 971: an implementation of a randomized algorithm for principal component analysis. ACM Trans Math Softw 43(3):28–12814
Article MathSciNet PubMed PubMed Central Google Scholar
Minsky M, Papert S (1988) Perceptrons, Expanded edn
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge
Google Scholar
Rahmani M, Atia GK (2017) Coherence pursuit: fast, simple, and robust principal component analysis. IEEE Trans Signal Process 65(23):6260–6275. https://doi.org/10.1109/TSP.2017.2749215
Article ADS MathSciNet Google Scholar
Rahmani M, Li P (2019) Outlier detection and robust pca using a convex measure of innovation. In: Advances in neural information processing systems, pp 14223–14233
Shah S, He B, Maung C, Schweitzer H (2018) Computing robust principal components by A* search. Int J Artif Intell Tools 27(7):1860013
Article Google Scholar
Stewart GW, Sun J (1990) Matrix perturbation theory
Vaswani N, Narayanamurthy P (2018) Static and dynamic robust PCA and matrix completion: a review. Proc IEEE 106(8):1359–1379
Article Google Scholar
Vidal R, Ma Y, Sastry SS (2016) Generalized principal component analysis. IEEE Trans Pattern Anal Mach Intell 27(12):1945–1959
Article Google Scholar
Wan G, Schweitzer H (2021) A lookahead algorithm for robust subspace recovery. In: 2021 IEEE international conference on data mining (ICDM), pp 1379–1384. IEEE
Wan G, Schweitzer H (2021) A new robust subspace recovery algorithm (student abstract). In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 15911–15912
Wan G, Schweitzer H (2021) Accelerated combinatorial search for outlier detection with provable bound on sub-optimality. In: Proceedings of the 35th national conference on artificial intelligence (AAAI’21)
Xu H, Caramanis C, Sanghavi S (2010) Robust pca via outlier pursuit. In: Advances in neural information processing systems, pp 2496–2504
You C, Robinson DP, Vidal R (2017) Provable self-representation based outlier detection in a union of subspaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3395–3404
Zhang T (2016) Robust subspace recovery by Tyler’s M-estimator. Inf Inference J IMA 5(1):1–21
MathSciNet Google Scholar
Zhang H, Lin Z, Zhang C, Chang E.Y (2015) Exact recoverability of robust PCA via outlier pursuit with tight recovery bounds. In: Twenty-ninth AAAI conference on artificial intelligence

Download references

Author information

Authors and Affiliations

Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02115, USA
Guihong Wan
Departments of Biostatistics and Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, 02115, USA
Guihong Wan
Department of Computer Science, University of Texas at Dallas, Richardson, TX, 75080, USA
Baokun He & Haim Schweitzer

Authors

Guihong Wan
View author publications
You can also search for this author in PubMed Google Scholar
Baokun He
View author publications
You can also search for this author in PubMed Google Scholar
Haim Schweitzer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guihong Wan.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Responsible editor: Srinivasan Parthasarathy

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Guihong Wan and Baokun He have contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wan, G., He, B. & Schweitzer, H. The art of centering without centering for robust principal component analysis. Data Min Knowl Disc 38, 699–724 (2024). https://doi.org/10.1007/s10618-023-00976-y

Download citation

Received: 28 December 2021
Accepted: 22 August 2023
Published: 09 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10618-023-00976-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The art of centering without centering for robust principal component analysis

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

A Comprehensive Survey of Anomaly Detection Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The art of centering without centering for robust principal component analysis

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

A Comprehensive Survey of Anomaly Detection Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation