PLS Regression and Hybrid Methods in Genomics Association Studies

  • Antonio Ciampi
  • Lin Yang
  • Aurélie Labbe
  • Chantal Mérette
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 56)


Using data from a case-control study on schizophrenia, we demonstrate the use of PLS regression in constructing predictors of a phenotype from Single Nucleotide Polymorphisms (SNPs). We consider straightforward application of PLS regression as well as two hybrid methods, in which PLS regression scores are used as input for a tree-growing algorithm and a clustering algorithm respectively. We compare these approaches with other classic predictors used in statistical learning, showing that our PLS-based hybrid methods outperform both classic predictors and straightforward PLS regression.

Key words

PLS Regression Bagging SNP GWAS 


  1. [1]
    L.B. Maher, “Personal genomes, the case of the missing heritability,” Nature, 456, 18–21, 2008.CrossRefGoogle Scholar
  2. [2]
    T.A. Manolio, et al., “Finding the missing heritability of complex diseases” Nature, 461, 747–753, 2009.CrossRefGoogle Scholar
  3. [3]
    R.A. Fisher, The Genetical Theory of Natural Selection, Oxford University Press, Oxford, 1930.zbMATHGoogle Scholar
  4. [4]
    P.M. Visscher, W.G. Hill, and N. Wray, “Heritability in the Genomics era: Errors and misconceptions,” Nature Review Genetics, 9, 255–266, 2008.CrossRefGoogle Scholar
  5. [5]
    G. Gibson, “Rare and common variants: twenty arguments,” Nature Review Genetics, 13, 135–145, 2012.CrossRefGoogle Scholar
  6. [6]
    T. Hastie,T., R. Tibshirani, J.H., Friedman, The elements of Statistical Learning New York, Springer, 2008.Google Scholar
  7. [7]
    I. Frank, J. Friedman, “A statistical view of some Chemometrics regression tools,” Technometrics 35, 109–135, 1993.CrossRefzbMATHGoogle Scholar
  8. [8]
    L. Breiman, “Bagging Predictors” Machine Learning, 26, 123–140, 1996.Google Scholar
  9. [9]
    Y. Freund, R.E. Schapire, “A short introduction to boosting,” Journal of Japanese Society for Artificial Intelligence, 14, 771–780, 1999.Google Scholar
  10. [10]
    L. Breiman, “Random forests,” Machine Learning, 45, 5–32, 2001.CrossRefzbMATHGoogle Scholar
  11. [11]
    G.W. Brier, “Verification of forecasts expressed in terms of probability,” Monthly Weather Review, 78, 1–3, 1950.CrossRefGoogle Scholar
  12. [12]
    E.M. Ohman, C.B. Granger, R.A. Harrington, K.L. Lee, “Risk stratification and therapeutic decision making in acute coronary syndromes,” The Journal of the American Medical Association, 284, 876–878, 2000.CrossRefGoogle Scholar
  13. [13]
    J.A. Hanley, B.J. McNeil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, 143, 29–36, 1982.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Antonio Ciampi
    • 1
  • Lin Yang
    • 2
  • Aurélie Labbe
    • 1
  • Chantal Mérette
    • 3
  1. 1.Department of EpidemiologyBiostatistics, and Occupational HealthMontréalCanada
  2. 2.Division of Clinical EpidemiologyMcGill University Health CentreMontréalCanada
  3. 3.Faculty of Medicine, Department of Psychiatry and NeurosciencesUniversité LavalQuebec CityCanada

Personalised recommendations