Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA-seq Data

  • Krzysztof GogolewskiEmail author
  • Maciej Sykulski
  • Neo Christopher Chung
  • Anna Gambin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10847)


The development of single cell RNA sequencing (scRNA-seq) has enabled innovative approaches to investigating mRNA abundances. In our study, we are interested in extracting the systematic patterns of scRNA-seq data in an unsupervised manner, thus we have developed two extensions of robust principal component analysis (RPCA). First, we present a truncated version of RPCA (tRPCA), that is much faster and memory efficient. Second, we introduce a noise reduction in tRPCA with \(L_2\) regularization (tRPCAL2). Unlike RPCA that only considers a low-rank L and sparse S matrices, the proposed method can also extract a noise E matrix inherent in modern genomic data. We demonstrate its usefulness by applying our methods on the peripheral blood mononuclear cell (PBMC) scRNA-seq data. Particularly, the clustering of a low-rank L matrix showcases better classification of unlabeled single cells. Overall, the proposed variants are well-suited for high-dimensional and noisy data that are routinely generated in genomics.


Principal component analysis Robust PCA Truncated singular value decomposition Matrix decomposition Unsupervised learning Single cell RNA-seq 



This work was supported by the Polish National Science Centre grant no. 2016/21/N/ST6/01507 and no. 2016/23/D/ST6/03613. The authors thank B. Miasojedow, Ph.D. for comments and suggestions.


  1. 1.
    Novelli, G., Ciccacci, C., Borgiani, P., Amati, M.P., Abadie, E.: Genetic tests and genomic biomarkers: regulation, qualification and validation. Clin. Cases Miner. Bone Metab. 5(2), 149–154 (2008)Google Scholar
  2. 2.
    Wills, Q.F., et al.: Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments. Nat. Biotechnol. 31(8), 748–752 (2013)CrossRefGoogle Scholar
  3. 3.
    Gogolewski, K., Wronowska, W., Lech, A., Lesyng, B., Gambin, A.: Inferring molecular processes heterogeneity from transcriptional data. Biomed Res. Int. 2017, 14 p. (2017). Article no. 6961786CrossRefGoogle Scholar
  4. 4.
    Wang, Y., Navin, N.E.: Advances and applications of single-cell sequencing technologies. Mol. Cell 58(4), 598–609 (2015)CrossRefGoogle Scholar
  5. 5.
    Ramskold, D., et al.: Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30(8), 777–782 (2012)CrossRefGoogle Scholar
  6. 6.
    Jolliffe, I.T.: Principal Component Analysis. Springer, Heidelberg (2002). Scholar
  7. 7.
    Bartholomew, D.J., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis: A Unified Approach. Wiley Series in Probability and Statistics (2011)Google Scholar
  8. 8.
    Chung, N.C., Storey, J.D.: Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31(4), 545–554 (2015)CrossRefGoogle Scholar
  9. 9.
    Leek, J.T.: Asymptotic conditional singular value decomposition for high-dimensional genomic data. Biometrics 67, 344–352 (2010)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Chu, L.F., et al.: Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 17(1), 173 (2016)CrossRefGoogle Scholar
  11. 11.
    Usoskin, D., et al.: Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat. Neurosci. 18(1), 145–153 (2015)CrossRefGoogle Scholar
  12. 12.
    Ilicic, T., et al.: Classification of low quality cells from single-cell RNA-seq data. Genome Biol. 17, 29 (2016)CrossRefGoogle Scholar
  13. 13.
    Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11:1–11:37 (2011)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRefGoogle Scholar
  15. 15.
    Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. JCGS 15(2), 262–286 (2006)MathSciNetGoogle Scholar
  16. 16.
    Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)CrossRefGoogle Scholar
  17. 17.
    Yuan, X., Yang, J.: Sparse and Low-Rank Matrix Decomposition Via Alternating Direction Methods. (2009)Google Scholar
  18. 18.
    Sykulski, M.: RPCA: RobustPCA: Decompose a Matrix into Low-Rank and Sparse Components (2015). R package version 0.2.3Google Scholar
  19. 19.
    Baglama, J., Reichel, L., Lewis, B.W.: irlba: Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices (2018). R package version 2.3.2Google Scholar
  20. 20.
    Basu, S., Campbell, H.M., Dittel, B.N., Ray, A.: Purification of specific cell population by fluorescence activated cell sorting (FACS). J. Vis. Exp. 10(41) (2010)Google Scholar
  21. 21.
    Zheng, G.X., et al.: Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017)CrossRefGoogle Scholar
  22. 22.
    van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Ohkawa, T., et al.: Systematic characterization of human CD8+ T cells with natural killer cell markers in comparison with natural killer cells and normal CD8+ T cells. Immunology 103(3), 281–290 (2001)CrossRefGoogle Scholar
  24. 24.
    Ziegler-Heitbrock, L., et al.: Nomenclature of monocytes and dendritic cells in blood. Blood 116(16), 74–80 (2010)CrossRefGoogle Scholar
  25. 25.
    Chu, P.G., Arber, D.A.: CD79: a review. Appl. Immunohistochem. Mol. Morphol. 9(2), 97–106 (2001)Google Scholar
  26. 26.
    Adachi, M., Ryo, R., Sato, T., Yamaguchi, N.: Platelet factor 4 gene expression in a human megakaryocytic leukemia cell line (CMK) and its differentiated subclone (CMK11-5). Exp. Hematol. 19(9), 923–927 (1991)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Krzysztof Gogolewski
    • 1
    Email author
  • Maciej Sykulski
    • 2
    • 3
  • Neo Christopher Chung
    • 1
  • Anna Gambin
    • 1
  1. 1.Institute of Informatics, Faculty of Mathematics, Informatics and MechanicsUniversity of WarsawWarsawPoland
  2. 2.Department of Medical GeneticsWarsaw Medical UniversityWarsawPoland
  3. 3.genXone Ltd.PoznańPoland

Personalised recommendations