Skip to main content

Penalized Partial Least Square applied to structured data

Abstract

Nowadays, data analysis applied to high dimension has arisen. The edification of high-dimensional data can be achieved by the gathering of different independent data. However, each independent set can introduce its own bias. We can cope with this bias introducing the observation set structure into our model. The goal of this article is to build theoretical background for the dimension reduction method sparse Partial Least Square (sPLS) in the context of data presenting such an observation set structure. The innovation consists in building different sPLS models and linking them through a common-Lasso penalization. This theory could be applied to any field, where observation present this kind of structure and, therefore, improve the sPLS in domains, where it is competitive. Furthermore, it can be extended to the particular case, where variables can be gathered in given a priori groups, where sPLS is defined as a sparse group Partial Least Square.

References

  1. Boulesteix, A.-L.; Strimmer, K.: Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform. 8(1), 32–44 (2006)

    Article  Google Scholar 

  2. Chun, H.; Keleş, S.: Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 72(1), 3–25 (2010)

    MATH  Article  MathSciNet  Google Scholar 

  3. De Bie, T.; Cristianini, N.; Rosipal, R.: Eigenproblems in pattern recognition. In: Handbook of Geometric Computing, pp. 129–167. Springer (2005)

  4. Eslami, A.; Qannari, E.M.; Kohler, A.; Bougeard, S.: Algorithms for multi-group pls. J. Chemom. 28(3), 192–201 (2014)

    Article  Google Scholar 

  5. Gagnon-Bartsch, J.A.; Speed, T.P.: Using control genes to correct for unwanted variation in microarray data. Biostatistics 13(3), 539–552 (2012)

    Article  Google Scholar 

  6. Geladi, P.; Kowalski, B.R.: Partial least-squares regression: a tutorial. Anal. Chim. Acta 185, 1–17 (1986)

    Article  Google Scholar 

  7. Herman, W.: Path models with latent variables: the nipals approach. In: Quantitative Sociology, pp. 307–357. Elsevier (1975)

  8. Lê Cao, K.-A.; Rossouw, D.; Robert-Granié, C.; Besse, P.: A sparse pls for variable selection when integrating omics data. Stat. Appl. Genet. Mol. Biol. 7(1), 35 (2008)

    MATH  Article  MathSciNet  Google Scholar 

  9. Liquet, B.; de Micheaux, P.L.; Hejblum, B.P.; Thiébaut, R.: Group and sparse group partial least square approaches applied in genomics context. Bioinformatics 32(1), 35–42 (2015)

    Google Scholar 

  10. Liquet, B.; Mengersen, K.; Pettitt, A.N.; Sutton, M.: Bayesian variable selection regression of multivariate responses for group data. Bayesian Anal. 12(4), 1039–1067 (2017)

    MATH  Article  MathSciNet  Google Scholar 

  11. Paaby, A.B.; Rockman, M.V.: The many faces of pleiotropy. Trends Genet. 29(2), 66–73 (2013)

    Article  Google Scholar 

  12. Price, A.L.; Patterson, N.J.; Plenge, R.M.; Weinblatt, M.E.; Shadick, N.A.; Reich, D.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38(8), 904 (2006)

    Article  Google Scholar 

  13. Rohart, F.; Eslami, A.; Matigian, N.; Bougeard, S.; Le Cao, K.-A.: Mint: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. BMC Bioinform. 18(1), 128 (2017)

    Article  Google Scholar 

  14. Seoane, J.A.; Campbell, C.; Day, I.N.M.; Casas, J.P.; Gaunt, T.R.: Canonical correlation analysis for gene-based pleiotropy discovery. PLoS Comput. Biol. 10(10), e1003876 (2014)

    Article  Google Scholar 

  15. Shen, H.; Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 99(6), 1015–1034 (2008)

    MATH  Article  MathSciNet  Google Scholar 

  16. Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)

    Article  MathSciNet  Google Scholar 

  17. Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102(43), 15545–15550 (2005)

    Article  Google Scholar 

  18. Sutton, M.; Thiébaut, R.; Liquet, B.: Sparse partial least squares with group and subgroup structure. Stat. Med. (2018) 37(23), 3338–3356

  19. Tenenhaus, A.; Philippe, C.; Guillemot, V.; Le Cao, K.-A.; Grill, J.; Frouin, V.: Variable selection for generalized canonical correlation analysis. Biostatistics 15(3), 569–583 (2014)

    Article  Google Scholar 

  20. Vinzi ,VE; Trinchera, L; Amato, S: Pls path modeling from foundations to recent developments and open issues for model assessment and improvement. In: Handbook of Partial Least Squares, pp. 47–82. Springer (2010)

  21. Walker, S.J.: Big data: a revolution that will transform how we live, work, and think. Int. J. Advert. 33(1), 181–183 (2014)

  22. Wang, T.; Ho, G.; Ye, K.; Strickler, H.; Elston, R.C.: A partial least-square approach for modeling gene-gene and gene-environment interactions when multiple markers are genotyped. Genet. Epidemiol. 33(1), 6–15 (2009)

    Article  Google Scholar 

  23. Yuan, M.; Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)

    MATH  Article  MathSciNet  Google Scholar 

  24. Zou, H.; Hastie, T.; Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Camilo Broc.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Broc, C., Calvo, B. & Liquet, B. Penalized Partial Least Square applied to structured data. Arab. J. Math. 9, 329–344 (2020). https://doi.org/10.1007/s40065-019-0248-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40065-019-0248-6

Mathematics Subject Classification

  • 62H99
  • 62J07