Abstract
Partial least squares discriminant analysis (PLS-DA) is a partial least squares regression of a set Y of binary variables describing the categories of a categorical variable on a set X of predictor variables. It is a compromise between the usual discriminant analysis and a discriminant analysis on the significant principal components of the predictor variables. This technique is specially suited to deal with a much larger number of predictors than observations and with multicollineality, two of the main problems encountered when analysing microarray expression data. We explore the performance of PLS-DA with published data from breast cancer (Perou et al. 2000). Several such analyses were carried out: (1) before vs after chemotherapy treatment, (2) estrogen receptor positive vs negative tumours, and (3) tumour classification. We found that the performance of PLS-DA was extremely satisfactory in all cases and that the discriminant cDNA clones often had a sound biological interpretation. We conclude that PLS-DA is a powerful yet simple tool for analysing microarray data.
Similar content being viewed by others
References
Alter O, Brown PO, Botstein D (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 97:10101–10106
Charpentier AH, Bednarek AK, Daniel RL, Hawkins KA, Laflin KJ, Gaddis S, MacLeod MC, Aldaz CM (2000) Effects of estrogen on global gene expression: identification of novel targets of estrogen action. Cancer Res 60:5977–5983
Datta S (2001) Exploring relationships in gene expressions: a partial least squares approach. Gene Expr 9:249–255
De Bruin A, Muller E, Wurm S, Caldelari R, Wyder M, Wheelock MJ, Suter MM (1999) Loss of invasiveness in squamous cell carcinoma cells overexpressing desmosomal cadherins. Cell Adhes Commun 7:13–28
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868
Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (1999) Introduction to multi- and megavariate data analysis using projection methods (PCA and PLS). Umetrics, Umea
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–135
Gershon D (2002) Microarray technology: an array of opportunities. Nature 416:885–891
Good PI (2000) Permutation tests: a practical guide to resampling methods for testing hypotheses. Springer, New York
Gruvberger S, Ringner M, Chen Y, Panavally S, Saal LH, Borg A, Ferno M, Peterson C, Meltzer PS (2001) Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns. Cancer Res 61:5979–5984
Hastie T, Tibshirani R, Friedman JH (2001) The elements of statistical learning. Springer, New York
Hedenfak IA, Ringner M, Trent JM, Borg A (2002) Gene expression in inherited breast cancer. Adv Cancer Res 84:1–34
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679
Kinoshita Y, Jarell AD, Flaman JM, Foltz G, Schuster J, Sopher BL, Irvin DK, Kanning K, Kornblum HI, Nelson PS, Hieter P, Morrison RS (2001) Pescadillo, a novel cell cycle regulatory protein abnormally expressed in malignant cells. J Biol Chem 276:6656–6665
Knudsen S (2002) A biologist's guide to analysis of DNA microarray data. Wiley, New York
Kondo S, Kubota S, Shimo T, Nishida T, Yosimichi G, Eguchi T, Sugahara T, Takigawa M (2002) Connective tissue growth factor increased by hypoxia may initiate angiogenesis in collaboration with matrix metalloproteinases. Carcinogenesis 23:769–776
Lakhani SR, Ashworth A (2001) Microarray and histopathological analysis of tumours: the future and the past? Nat Rev Cancer 1:151–157
Nguyen DV, Rocke DM (2002) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18:39–50
Nobori K, Ito H, Tamamori-Adachi M, Adachi S, Ono Y, Kawauchi J, Kitajima S, Marumo F, Isobe M (2002) ATF3 inhibits doxorubicin-induced apoptosis in cardiac myocytes: a novel cardioprotective role of ATF3. J Mol Cell Cardiol 34:1387–1397
Osborne CK (1998) Steroid hormone receptors in breast cancer management. Breast Cancer Res Treat 51:227–238
Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale AL, Brown PO, Botstein D (2000) Molecular portraits of human breast tumours. Nature 406:747–752
Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genet 2:418–427
Shtil AA, Mandlekar S, Yu R, Walter RJ, Hagen K, Tan TH, Roninson IB, Kong AN (1999) Differential regulation of mitogen-activated protein kinases by microtubule-binding agents in human breast cancer cells. Oncogene 18: 377–384
Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Eystein Lonning P, Borresen-Dale AL (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 98:10869–10874
Tenenhaus M (1998) La régression PLS. Editions Technip, Paris
Thuillier P, Brash AR, Kehrer JP, Stimmel JB, Leesnitzer LM, Yang P, Newman RA, Fischer SM (2002) Inhibition of PPAR-mediated keratinocyte differentiation by lipoxygenase inhibitors. Biochem J 366:901–910
van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA Jr, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98:11462–11467
Wold S, Martens H, Wold H (1983) The multivariate calibration problem in chemistry solved by the PLS method. In: Ruhe A, Kagstrom B (eds) Proc Conf Matrix Pencils. Springer, Heidelberg, pp 286–293
Acknowledgements
We thank A. Børresen-Dale for comments. This work was funded by Action en Bioinformatique, Ministère de la Recherche of France.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pérez-Enciso, M., Tenenhaus, M. Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Hum Genet 112, 581–592 (2003). https://doi.org/10.1007/s00439-003-0921-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-003-0921-9