Skip to main content
Log in

Cancer classification based on microarray gene expression data using a principal component accumulation method

  • Articles
  • Published:
Science China Chemistry Aims and scope Submit manuscript

Abstract

The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data, however, makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data, it may overemphasize some aspects and ignore some other important information contained in the richly complex data, because it displays only the difference in the first two- or three-dimensional PC subspaces. Based on PCA, a principal component accumulation (PCAcc) method was proposed. It employs the information contained in multiple PC subspaces and improves the class separability of cancers. The effectiveness of the present method was evaluated by four commonly used gene expression datasets, and the results show that the method performs well for cancer classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alizadeh AA, Eisen MB, Eric Davis R, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson Jr J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 2000, 403: 503–511

    Article  CAS  Google Scholar 

  2. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA, 1999, 96: 6745–6750

    Article  CAS  Google Scholar 

  3. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim, JYH, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 2002, 415: 436–442

    Article  CAS  Google Scholar 

  4. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 1999, 286: 531–537

    Article  CAS  Google Scholar 

  5. Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 2005, 21: 631–643

    Article  CAS  Google Scholar 

  6. Su ZQ, Hong HX, Perkins R, Shao XG, Cai WS, Tong WD. Consensus analysis of multiple classifiers using non-repetitive variables: Diagnostic application to microarray gene expression data. Comput Biol Chem, 2007, 31: 48–56

    Article  CAS  Google Scholar 

  7. Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou XD, Li JY, Liu HQ, Pui CH, Evans WE, Naeve C, Wong LS, Downing JR. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 2002, 1: 133–143

    Article  CAS  Google Scholar 

  8. Zhang HP, Yu CY, Singer B. Cell and tumor classification using gene expression data: construction of forests. Proc Natl Acad Sci USA, 2003, 100: 4168–4172

    Article  CAS  Google Scholar 

  9. Cawley GC, Talbot NLC. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics, 2006, 22: 2348–2355

    Article  CAS  Google Scholar 

  10. Marttinen P, Myllykangas S, Corander J. Bayesian clustering and feature selection for cancer tissue samples. BMC Bioinformatics, 2009, 10: 90

    Article  Google Scholar 

  11. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn, 2002, 46: 389–422

    Article  Google Scholar 

  12. Wang L, Zhu J, Zou H. Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics, 2008, 24: 412–419

    Article  CAS  Google Scholar 

  13. Tang LJ, Du W, Fu HY, Jiang JH, Wu HL, Shen GL, Yu RQ. New variable selection method using interval segmentation purity with application to blockwise kernel transform support vector machine classification of high-dimensional microarray data. J Chem Inf Model, 2009, 49: 2002–2009

    Article  CAS  Google Scholar 

  14. Newman AM, Cooper JB. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number. BMC Bioinformatics, 2010, 11: 117

    Article  Google Scholar 

  15. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 2002, 1: 203–209

    Article  CAS  Google Scholar 

  16. Zhang JG, Deng HW. Gene selection for classification of microarray data based on the Bayes error. BMC Bioinformatics, 2007, 8: 370

    Article  Google Scholar 

  17. Ahmed FE. Artificial neural networks for diagnosis and survival prediction in colon cancer. Mol Cancer, 2005, 4: 29

    Article  Google Scholar 

  18. Lancashire LJ, Lemetre C, Ball GR. An introduction to artificial neural networks in bioinformatics-application to complex microarray and mass spectrometry datasets in cancer studies. Brief Bioinform, 2009, 10: 315–329

    Article  CAS  Google Scholar 

  19. Kim EY, Kim SY, Ashlock D, Nam D. MULTI-K: Accurate classification of microarray subtypes using ensemble k-means clustering. BMC Bioinformatics, 2009, 10: 260

    Article  Google Scholar 

  20. Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 2005, 21: 3896–3904

    Article  CAS  Google Scholar 

  21. Joliffe IT. Principal Component Analysis. New York: Springer, 1986

    Google Scholar 

  22. Song JJ, Ren Y, Yan FL. Classification for high-throughput data with an optimal subset of principal components. Comput Biol Chem, 2009, 33: 408–413

    Article  CAS  Google Scholar 

  23. Yeung KY, Ruzzo WL. Principal component analysis for clustering gene expression data. Bioinformatics, 2001, 17: 763–774

    Article  CAS  Google Scholar 

  24. Liu AY, Zhang Y, Gehan E, Clarke R. Block principal component analysis with application to gene microarray data classification. Statist Med, 2002, 21: 3465–3474

    Article  Google Scholar 

  25. Alexe G, Dalgin GS, Ganesan S, Delisi C, Bhanot G. Analysis of breast cancer progression using principal component analysis and clustering. J Biosci, 2007, 32: 1027–1039

    Article  CAS  Google Scholar 

  26. Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Ramaswamy S, Richards, WG, Sugarbaker DJ, Bueno R. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res, 2002, 62: 4963–4967

    CAS  Google Scholar 

  27. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray, TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med, 2002, 8: 68–74

    Article  CAS  Google Scholar 

  28. Kuner R, Muley T, Meister M, Ruschhaupt M, Buness A, Xu EC, Schnabel P, Warth, A, Poustka A, Sultmann H, Hoffmann H. Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes. Lung Cancer, 2009, 63: 32–38

    Article  Google Scholar 

  29. Qiu X, Brooks AI, Klebanov L, Yakovlev A. The effects of normalization on the correlation structure of-microarray data. BMC Bioinformatics, 2005, 6: 120

    Article  Google Scholar 

  30. Fisher RA. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7: 179–188

    Article  Google Scholar 

  31. Tan AC, Gilbert D. Ensemble machine learning on gene expression data for cancer classification. Appl Bioinformatics, 2003, 2: 75–83

    Google Scholar 

  32. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16: 906–914

    Article  CAS  Google Scholar 

  33. Nguyen DV, Rocke DM. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 2002, 18: 39–50

    Article  CAS  Google Scholar 

  34. Wang XS, Gotoh O. Accurate molecular classification of cancer using simple rules. BMC Med Genom, 2009, 2: 64

    Article  Google Scholar 

  35. Kelemen JZ, Kertesz-Farkas A, Kocsor A, Puskas LG. Kalman filtering for disease-state estimation from microarray data. Bioinformatics, 2006, 22: 3047–3053

    Article  CAS  Google Scholar 

  36. Deutsch JM. Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics, 2003, 19: 45–52

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to XueGuang Shao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Cai, W. & Shao, X. Cancer classification based on microarray gene expression data using a principal component accumulation method. Sci. China Chem. 54, 802–811 (2011). https://doi.org/10.1007/s11426-011-4263-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11426-011-4263-5

Keywords

Navigation