Skip to main content
Log in

The art of centering without centering for robust principal component analysis

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Many robust variants of Principal Component Analysis remove outliers from the data and compute the principal components of the remaining data. The robust centered variant requires knowledge of the center of the non-outliers. Unfortunately, the center of non-outliers is unknown until after the outliers are determined, and using an inaccurate center may lead to the detection of wrong outliers. We demonstrate this problem in several known robust PCA algorithms. We describe a method that implicitly centers the non-outliers, implemented by appending a constant value (bias) to each data point. This bias method can be used with “black box” robust PCA algorithms by augmenting their input with minimal change to the algorithm itself.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. There is a sign ambiguity in calculating eigenvectors, since whenever v is an eigenvector with \(\lambda\) eigenvalue then so is \({-}v\). All eigenvectors in Fig. 2 were normalized so that their coordinate sum is positive.

References

  • Bouwmans T, Sobral A, Javed S, Jung SK, Zahzah E (2017) Decomposition into low-rank plus additive matrices for background/foreground separation: a review for a comparative evaluation with a large-scale dataset. Comput Sci Rev 23:1–71

    Article  Google Scholar 

  • Burges C (2010) Dimension reduction: a guided tour. Now Publishers Inc., Hanover

    Google Scholar 

  • Cadima J, Jolliffe I (2009) On relationships between uncentred and column-centred principal component analysis. Pakistan J Stat 25(4):473–503

    MathSciNet  Google Scholar 

  • Candes EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis?. J ACM 58(3):11–11137

    Article  MathSciNet  Google Scholar 

  • Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Gillis N, Vavasis SA (2018) On the complexity of robust PCA and l1-norm low-rank matrix approximation. Math Oper Res 43(4):1072–1084

    Article  MathSciNet  Google Scholar 

  • Golub GH, Van-Loan CF (2013) Matrix computations, 4th edn

  • Gray V (2017) Principal component analysis: methods. Appl Technol Math Res Dev

  • Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288

    Article  MathSciNet  Google Scholar 

  • He B, Wan G, Schweitzer H (2020) A bias trick for centered robust principal component analysis (student abstract). In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 13807–13808

  • Hubert M, Engelen S (2004) Robust PCA and classification in biosciences. Bioinformatics 20(11):1728–1736

    Article  CAS  PubMed  Google Scholar 

  • Hughes JF, van Dam A, McGuire M, Sklar DF, Foley JD, Feiner SK, Akeley K (2013) Computer graphics: principles and practice, 3rd edn

  • Jolliffe IT (2002) Principal component analysis, 2nd edn

  • Kuang-Chih L, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698

    Article  Google Scholar 

  • Lerman G, Maunu T (2018) An overview of robust subspace recovery. Proc IEEE 106(8):1380–1410

    Article  Google Scholar 

  • Lerman G, Maunu T (2018) Fast, robust and non-convex subspace recovery. Inf Inference J IMA 7(2):277–336

    MathSciNet  Google Scholar 

  • Li H, Linderman GC, Szlam A, Stanton KP, Kluger Y, Tygert M (2017) Algorithm 971: an implementation of a randomized algorithm for principal component analysis. ACM Trans Math Softw 43(3):28–12814

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Minsky M, Papert S (1988) Perceptrons, Expanded edn

  • Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge

    Google Scholar 

  • Rahmani M, Atia GK (2017) Coherence pursuit: fast, simple, and robust principal component analysis. IEEE Trans Signal Process 65(23):6260–6275. https://doi.org/10.1109/TSP.2017.2749215

    Article  ADS  MathSciNet  Google Scholar 

  • Rahmani M, Li P (2019) Outlier detection and robust pca using a convex measure of innovation. In: Advances in neural information processing systems, pp 14223–14233

  • Shah S, He B, Maung C, Schweitzer H (2018) Computing robust principal components by A* search. Int J Artif Intell Tools 27(7):1860013

    Article  Google Scholar 

  • Stewart GW, Sun J (1990) Matrix perturbation theory

  • Vaswani N, Narayanamurthy P (2018) Static and dynamic robust PCA and matrix completion: a review. Proc IEEE 106(8):1359–1379

    Article  Google Scholar 

  • Vidal R, Ma Y, Sastry SS (2016) Generalized principal component analysis. IEEE Trans Pattern Anal Mach Intell 27(12):1945–1959

    Article  Google Scholar 

  • Wan G, Schweitzer H (2021) A lookahead algorithm for robust subspace recovery. In: 2021 IEEE international conference on data mining (ICDM), pp 1379–1384. IEEE

  • Wan G, Schweitzer H (2021) A new robust subspace recovery algorithm (student abstract). In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 15911–15912

  • Wan G, Schweitzer H (2021) Accelerated combinatorial search for outlier detection with provable bound on sub-optimality. In: Proceedings of the 35th national conference on artificial intelligence (AAAI’21)

  • Xu H, Caramanis C, Sanghavi S (2010) Robust pca via outlier pursuit. In: Advances in neural information processing systems, pp 2496–2504

  • You C, Robinson DP, Vidal R (2017) Provable self-representation based outlier detection in a union of subspaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3395–3404

  • Zhang T (2016) Robust subspace recovery by Tyler’s M-estimator. Inf Inference J IMA 5(1):1–21

    MathSciNet  Google Scholar 

  • Zhang H, Lin Z, Zhang C, Chang E.Y (2015) Exact recoverability of robust PCA via outlier pursuit with tight recovery bounds. In: Twenty-ninth AAAI conference on artificial intelligence

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guihong Wan.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Responsible editor: Srinivasan Parthasarathy

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Guihong Wan and Baokun He have contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, G., He, B. & Schweitzer, H. The art of centering without centering for robust principal component analysis. Data Min Knowl Disc 38, 699–724 (2024). https://doi.org/10.1007/s10618-023-00976-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-023-00976-y

Keywords

Navigation