Abstract
Nowadays, data analysis applied to high dimension has arisen. The edification of high-dimensional data can be achieved by the gathering of different independent data. However, each independent set can introduce its own bias. We can cope with this bias introducing the observation set structure into our model. The goal of this article is to build theoretical background for the dimension reduction method sparse Partial Least Square (sPLS) in the context of data presenting such an observation set structure. The innovation consists in building different sPLS models and linking them through a common-Lasso penalization. This theory could be applied to any field, where observation present this kind of structure and, therefore, improve the sPLS in domains, where it is competitive. Furthermore, it can be extended to the particular case, where variables can be gathered in given a priori groups, where sPLS is defined as a sparse group Partial Least Square.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Boulesteix, A.-L.; Strimmer, K.: Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform. 8(1), 32–44 (2006)
Chun, H.; Keleş, S.: Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 72(1), 3–25 (2010)
De Bie, T.; Cristianini, N.; Rosipal, R.: Eigenproblems in pattern recognition. In: Handbook of Geometric Computing, pp. 129–167. Springer (2005)
Eslami, A.; Qannari, E.M.; Kohler, A.; Bougeard, S.: Algorithms for multi-group pls. J. Chemom. 28(3), 192–201 (2014)
Gagnon-Bartsch, J.A.; Speed, T.P.: Using control genes to correct for unwanted variation in microarray data. Biostatistics 13(3), 539–552 (2012)
Geladi, P.; Kowalski, B.R.: Partial least-squares regression: a tutorial. Anal. Chim. Acta 185, 1–17 (1986)
Herman, W.: Path models with latent variables: the nipals approach. In: Quantitative Sociology, pp. 307–357. Elsevier (1975)
Lê Cao, K.-A.; Rossouw, D.; Robert-Granié, C.; Besse, P.: A sparse pls for variable selection when integrating omics data. Stat. Appl. Genet. Mol. Biol. 7(1), 35 (2008)
Liquet, B.; de Micheaux, P.L.; Hejblum, B.P.; Thiébaut, R.: Group and sparse group partial least square approaches applied in genomics context. Bioinformatics 32(1), 35–42 (2015)
Liquet, B.; Mengersen, K.; Pettitt, A.N.; Sutton, M.: Bayesian variable selection regression of multivariate responses for group data. Bayesian Anal. 12(4), 1039–1067 (2017)
Paaby, A.B.; Rockman, M.V.: The many faces of pleiotropy. Trends Genet. 29(2), 66–73 (2013)
Price, A.L.; Patterson, N.J.; Plenge, R.M.; Weinblatt, M.E.; Shadick, N.A.; Reich, D.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38(8), 904 (2006)
Rohart, F.; Eslami, A.; Matigian, N.; Bougeard, S.; Le Cao, K.-A.: Mint: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. BMC Bioinform. 18(1), 128 (2017)
Seoane, J.A.; Campbell, C.; Day, I.N.M.; Casas, J.P.; Gaunt, T.R.: Canonical correlation analysis for gene-based pleiotropy discovery. PLoS Comput. Biol. 10(10), e1003876 (2014)
Shen, H.; Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99(6), 1015–1034 (2008)
Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102(43), 15545–15550 (2005)
Sutton, M.; Thiébaut, R.; Liquet, B.: Sparse partial least squares with group and subgroup structure. Stat. Med. (2018) 37(23), 3338–3356
Tenenhaus, A.; Philippe, C.; Guillemot, V.; Le Cao, K.-A.; Grill, J.; Frouin, V.: Variable selection for generalized canonical correlation analysis. Biostatistics 15(3), 569–583 (2014)
Vinzi ,VE; Trinchera, L; Amato, S: Pls path modeling from foundations to recent developments and open issues for model assessment and improvement. In: Handbook of Partial Least Squares, pp. 47–82. Springer (2010)
Walker, S.J.: Big data: a revolution that will transform how we live, work, and think. Int. J. Advert. 33(1), 181–183 (2014)
Wang, T.; Ho, G.; Ye, K.; Strickler, H.; Elston, R.C.: A partial least-square approach for modeling gene-gene and gene-environment interactions when multiple markers are genotyped. Genet. Epidemiol. 33(1), 6–15 (2009)
Yuan, M.; Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)
Zou, H.; Hastie, T.; Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Broc, C., Calvo, B. & Liquet, B. Penalized Partial Least Square applied to structured data. Arab. J. Math. 9, 329–344 (2020). https://doi.org/10.1007/s40065-019-0248-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40065-019-0248-6