Abstract
There is currently much discussion about the analysis of multiple datasets from different groups, among which especially identifying a common basic structure of multiple groups has drawn a large amount of attention. In order to identify a common basic structure, common component analysis (CCA) was proposed by generalizing techniques for principal component analysis (PCA); i.e., CCA becomes standard PCA when applied to only one dataset. Although CCA can identify the common structure of multiple datasets, which cannot be extracted by standard PCA, CCA suffers from the following drawbacks. The common components are estimated as linear combinations of all variables, and thus it is difficult to interpret the identified common components. The fully dense loadings lead to erroneous results in CCA, because noisy features are inevitably included in datasets. To address these issues, we incorporate sparsity into CCA, and propose a novel strategy for sparse common component analysis based on \(L_{1}\)-type regularized regression modeling. We focus CCA which is formulated as the eigenvalue decomposition (EVD) of a Gram matrix (i.e., common loadings of multiple datasets can be estimated by EVD of a Gram matrix), and it can be performed by Singular value decomposition of a square root of the Gram matrix. We then propose sparse common component analysis based on sparse PCA to estimate sparse common loadings of multiple datasets. We also propose an algorithm to estimate sparse common loadings of multiple datasets. The proposed method can not only identify a common subspace but also select crucial common-features for multiple groups. Monte Carlo simulations and real-data analysis are conducted to examine the efficiency of the proposed sparse CCA. We observe from the numerical studies that our strategies can incorporate sparsity into the common loading estimation and efficiently recover a sparse common structure efficiently in multiple dataset analysis.
Similar content being viewed by others
References
Alhopuro P, Karhu A, Winqvist R et al (2008) Somatic mutation analysis of MYH11 in breast and prostate cancer. BMC Cancer 8:263
Al-Kandari NM, Jolliffe IT (2005) Variable selection and interpretation in correlation principal components. Environmetrics 16:659–672
Aruga J, Yokota N, Mikoshiba K (2003) Human SLITRK family genes: genomic organization and expression profiling in normal brain and brain tumor tissue. Gene 2:87–94
Boudou A, Cabral EN, Romain Y (2010) Centered and non-centered principal component analysis in the frequency domain. Stat Probab Lett 80:96–103
Cadima J, Jolliffe I (2009) On relationship between uncentered and column-centered principal component analysis. Pak J Stat 25:473–503
Castellana B, Escuin D, Peiro G, Garcia-Valdecasas B, Vazquez T, Pons C, Perez-Olabarria M, Barnadas A, Lerma E (2012) ASPN and GJB2 are implicated in the mechanisms of invasion of ductal breast carcinomas. J Cancer 3:175–183
Chen H, Suzuki M, Nakamura Y, Ohira M, Ando S, Iida T, Nakajima T, Nakagawara A, Kimura H (2005) Aberrant methylation of FBN2 in human non-small cell lung cancer. Lung Cancer 50:43–9
Chen YC, Huang RL, Huang YK, Liao YP, Su PH, Wang HC, Chang CC, Lin YW, Yu MH, Chu TY, Lai HC (2015) Methylomics analysis identifies epigenetically silenced genes and implies an activation of -catenin signaling in cervical cancer. BMC Cancer 15:117
Correa NM, Eichele T, Adali T, Li YO, Calhoun VD (2010) Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and functional MRI. Neuroimage 50:1438–1445
Deng J, Tang J, Wang G, Zhu YS (2017) Long non-coding RNA as potential biomarker for prostate cancer: is it making a difference? Int J Environ Res Public Health 14(3):270
Engle R (2002) Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional@heteroscedasticity models. J Bus Econ Stat 20:339–350
Flury BN (1984) Common principal components in K groups. J Am Stat Assoc 79:892–898
Gardi NL, Deshpande TU, Kamble SC, Budhe SR, Bapat SA (2013) Discrete molecular classes of ovarian cancer suggestive of unique mechanisms of transformation and metastases. Clin Cancer Res 20:87–99
Gebhardt C, Nemeth J, Angel P, Hess J (2006) S100A8 and S100A9 in inflammation and cancer. Biochem Pharmacol 72:1622–1631
Goncalves NP, Moreira J, Martins D, Vieira P, Obici L, Merlini G, Saraiva M, Saraiva MJ (2017) Differential expression of Cathepsin E in transthyretin amyloidosis: from neuropathology to the immune system. J Neuroinflammation 14:115
Gorringe KL, George J, Anglesio MS, Ramakrishna M, Etemadmoghadam D, Cowin P, Sridhar A, Williams LH, Boyle SE, Yanaihara N, Okamoto A, Urashima M, Smyth GK, Campbell IG, Bowtell DD (2010) Copy number analysis identifies novel interactions between genomic loci in ovarian cancer. PLoS ONE 5(9):e11408
Guo FJ, James G, Levina E, Michailidis G, Zhu J (2010) Principal component analysisi with sparse fused loadings. J Comput Graph Stat 19:930–946
Hartung F, Wang Y, Aronow B, Weber GF (2017) A core program of gene expression characterizes cancer metastases. Oncotarget 8(60):102161–102175
Hastie T, Tibshirani R, Friedman J (2003) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin
He Y, Li Y, Qiu Z, Zhou B, Shi S, Zhang K, Luo Y, Huang Q, Li W (2014) Identification and validation of PROM1 and CRTC2 mutations in lung cancer patients. Mol Cancer 13:19
Heinzelmann-Schwarz VA, Gardiner-Garden M, Henshall SM, Scurry JP, Scolyer RA, Smith AN, Bali A, Vanden Bergh P, Baron-Hay S, Scott C, Fink D, Hacker NF, Sutherland RL, O’Brien PM (2006) A distinct molecular profile associated with mucinous epithelial ovarian cancer. Br J Cancer 94:904–913
Hoerl E, Kennard W (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Honeine P (2014) An eigenanalysis of data centering in machine learning. arXiv:1407.2904
Huang C, Pollock CA, Chen XM (2014) High glucose induces CCL20 in proximal tubular cells via activation of the KCa3.1 channel. PLoS ONE 9:e95173
Ichikawa T, Saruwatari K, Mimaki S, Sugano M, Aokage K, Kojima M, Hishida T, Fujii S, Yoshida J, Kuwata T, Ochiai A, Suzuki K, Tsuboi M, Goto K, Tsuchihara K, Ishii G (2017) Immunohistochemical and genetic characteristics of lung cancer mimicking organizing pneumonia. Lung Cancer 113:134–139
Ignacio RM, Kabir SM, Lee ES, Adunyah SE, Son DS (2016) NF-fEB-mediated CCL20 reigns dominantly in CXCR2-driven ovarian cancer progression. PLoS ONE 11(10):e0164189
Inan D (2015) Combining the Liu-type estimator and the principal component regression estimator. Stat Paper 50:147–156
Jolicoeur R (1963) The degree of generality of robustness in Martes americana. Growth 27:1–27
Karim BO, Rhee KJ, Liu G, Yun K, Brant SR (2014) Prom1 function in development, intestinal inflammation, and intestinal tumorigenesis. Front Oncol 4:323
Konishi S (2014) Introduction to Multivariate Analysis. Hall/CRC, Boca Raton
Konno-Shimizu M, Yamamichi N, Inada K, Kageyama-Yahara N, Shiogama K, Takahashi Y, Asada-Hirayama I, Yamamichi-Nishina M, Nakayama C, Ono S, Kodashima S, Fujishiro M, Tsutsumi Y, Ichinose M, Koike K (2013) Cathepsin E is a marker of gastric differentiation and signet-ring cell carcinoma of stomach: a novel suggestion on gastric tumorigenesis. PLoS ONE 8:e56766
Leithner K, Hirschmugl B, Li Y, Tang B, Papp R, Nagaraj C, Stacher E, Stiegler P, Lindenmann J, Olschewski A, Olschewski H, Hrzenjak A (2016) TASK-1 regulates apoptosis and proliferation in a subset of non-small cell lung cancers. PLoS ONE 11(6):e0157453
Lin A, Hu Q, Li C, Xing Z, Ma G, Wang C, Li J, Ye Y, Yao J, Liang K, Wang S, Park PK, Marks JR, Zhou Y, Zhou J, Hung MC, Liang H, Hu Z, Shen H, Hawke DH, Han L, Zhou Y, Lin C, Yang L (2017) The LINK-A lncRNA interacts with PtdIns(3,4,5)P3 to hyperactivate AKT and confer resistance to AKT inhibitors. Nat Cell Biol 19:238–251
Lloyd KL, Cree IA, Savage RS (2013) Prediction of resistance to chemotherapy in ovarian cancer: a systematic review. Int J Cancer 135:117–127
Ma H, Cheng L, Hao K, Li Y, Song X, Zhou H, Jia L (2014) Reversal effect of ST6GAL 1 on multidrug resistance in human leukemia by regulating the PI3K/Akt pathway and the expression of P-gp and MRP1. PLoS ONE 9(1):e85113
McDonnell MD, Tissera MD, Vladusich T, Schaik A, Tapson J (2015) Fast, simple and accurate handwritten digit classification by training shallow neural network classifiers with the extreme learning machine algorithm. PLoS ONE 10(8):e0134254
Mirza Z, Schulten HJ, Farsi HM, Al-Maghrabi JA, Gari MA, Chaudhary AG, Abuzenadah AM, Al-Qahtani MH, Karim S (2014) Impact of S100A8 expression on kidney cancer progression and molecular docking studies for kidney cancer therapeutics. Anticancer Res 34:1873–84
Mwangi B, Tian TS, Soares JC (2014) A review of feature reduction techniques in neuroimaging. Neuroinformatics 12:229–244
Nadeau JS, Wilson RB, Hoggard JC, Wright BW, Synovec RE (2011) Study of the interdependency of the data sampling ratio with retention time alignment and principal component analysis for gas chromatography. J Chromatogr A 1218:9091–9101
Noordhuis MG, Fehrmann RS, Wisman GB, Nijhuis ER, van Zanden JJ, Moerland PD, Loren Ver, van Themaat E, Volders HH, Kok M, ten Hoor KA, Hollema H, de Vries EG, de Bock GH, van der Zee AG, Schuuring E (2011) Involvement of the TGF-beta and beta-catenin pathways in pelvic lymph node metastasis in early-stage cervical cancer. Clin Cancer Res 17(6):1317–30
Osuala KO, Sloane BF (2014) Many roles of CCL20: emphasis on breast cancer. Postdoc J 2:7–16
Patz JA, Campdell-Lendrum D, Holloway T, Foley JA (2005) Impact of regional climate change on human health. Nature 438:310–317
Paul G (2000) The use of common principal component analysis in studies of phenotypic evolution, an example from the Drosophilidae. Master thesis, University of Toronto
Pepler PT (2014) The identification and application of common principal components
Prodoehl MJ, Hatzirodos N, Irving-Rodgers HF, Zhao ZZ, Painter JN, Hickey TE, Gibson MA, Rainey WE, Carr BR, Mason HD, Norman RJ, Montgomery GW, Rodgers RJ (2009) Genetic and gene expression analyses of the polycystic ovary syndrome candidate gene fibrillin-3 and other fibrillin family members in human ovaries. Mol Hum Reprod 15:829–841
Qiu ZX, Zhao S, Mo XM, Li WM (2015) Overexpression of PROM1 (CD133) confers poor prognosis in non-small cell lung cancer. Int J Clin Exp Pathol 8:6589–6595
Ricketts CJ, Hill VK, Linehan WM (2014) Tumor-specific hypermethylation of epigenetic biomarkers, including SFRP1, predicts for poorer survival in patients from the TCGA Kidney Renal Clear Cell Carcinoma (KIRC) project. PLoS ONE 9(1):e85621
Richards EJ (2013) Molecular Profiling of Lung Cancer Thesis of PhD. National Heart and Lung Institute, Imperial College London
Rodrigues PC, Lima AT (2009) Analysis of an European union election using principal component analysis. Stat Paper 50:895–904
Rubie C, Frick VO, Ghadjar P, Wagner M, Grimm H, Vicinus B, Justinger C, Graeber S, Schilling MK (2010) CCL20/CCR6 expression profile in pancreatic cancer. J Transl Med 8:45
Sabino-Silva R, Mori RC, David-Silva A, Okamoto MM, Freitas HS, Machado UF (2010) The Na+/glucose cotransporters: from genes to therapy. Braz J Med Biol Res 43:1019–1026
Sebestyen E, Zawisza M, Eyras E (2015) Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer. Nucleic Acids Res 43:1345–1356
Singh PK, Sarkar R, Nasipuri M (2016) A study of moment based features on handwritten digit recognition applied computational intelligence and soft computing. Article ID 2796863
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 73:273–282
Ullmann R, Morbini P, Halbwedl I, Bongiovanni M, Gogg-Kammerer M, Papotti M, Gabor S, Renner H, Popper HH (2004) Protein expression profiles in adenocarcinomas and squamous cell carcinomas of the lung generated using tissue microarrays. J Pathol 203:798–807
Vickaryous N, Polanco-Echeverry G, Morrow S, Suraweera N, Thomas H, Tomlinson I, Silver A (2008) Smooth-muscle myosin mutations in hereditary non-polyposis colorectal cancer syndrome. Br J Cancer 99:1726–8
Wang H, Banerjee A, Boley D (2011) Common component analysis for multiple covariance matrices. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 956–964
Wang B, Shi L, Sun X, Wang L, Wang X, Chen C (2016) Production of CCL20 from lung cancer cells induces the cell migration and proliferation through PI3K pathway. J Cell Mol Med 20:920–929
Wang Z, Sun G, Zhang J, Zhao J (2017) Feature selection algorithm based on mutual information and lasso for microarray data. Open Biotech J 11
Wisniewski JR, Dus-Szachniewicz K, Ostasiewicz P, Ziokowski P, Rakus D, Mann M (2015) Absolute proteome analysis of colorectal Mucosa, Adenoma, and cancer reveals drastic changes in fatty acid metabolism and plasma membrane transporters. J Proteome Res 14(9):4005–4018
Yang D, Powell C, Bai J, Hu J, Lu S, Wang N (2017) P3.13-037 deep learning system for lung nodule detection. J Thoracic Oncol 12:S2329
Yasuda K, Torigoe T, Morita R, Kuroda T, Takahashi A, Matsuzaki J, Kochin V, Asanuma H, Hasegawa T, Saito T, Hirohashi Y, Sato N (2013) Ovarian cancer stem cells are enriched in side population and aldehyde dehydrogenase bright overlapping population. PLoS ONE 8(8):e68187
Zeng W, Chang H, Ma M, Li Y (2014) CCL20/CCR6 promotes the invasion and migration of thyroid cancer cells via NF-kappa B signaling-induced MMP-3 production. Exp Mol Pathol 97:184–190
Zhang L, Jiang H, Xu G, Wen H, Gu B, Liu J, Mao S, Na R, Jing Y, Ding Q, Zhang Y (2015) Proteins S100A8 and S100A9 are potential biomarkers for renal cell carcinoma in the early stages: results from a proteomic study integrated with bioinformatics analysis. Mol Med Rep 11:4093–100
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15:265–286
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Park, H., Konishi, S. Sparse common component analysis for multiple high-dimensional datasets via noncentered principal component analysis. Stat Papers 61, 2283–2311 (2020). https://doi.org/10.1007/s00362-018-1045-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-018-1045-6