Integration of Microarray Data for a Comparative Study of Classifiers and Identification of Marker Genes



Novel diagnostic tools promise the development of patient-tailored cancer treatment. However, one major step towards individualized therapy is to use a combination of various data sources, e.g. transcriptomic, proteomic, and clinical data. We have integrated clinical data and lung cancer microarray data that were generated on two different oligonucleotide platforms. We were interested in the question whether the prediction of survival outcome benefits from the integration of clinical and transcriptomic data. In addition, we attempted to identify those genes whose expression profiles correlate with survival outcome. We applied five machine learning techniques to predict survival risk groups, and we compared the models with respect to their performance and general user acceptance. Based on quantitative and qualitative evaluation criteria, we chose decision trees as the most relevant technique for this type of analysis. Our in silico analysis corroborates the role of numerous marker genes already described in lung adenocarcinomas. In addition, our study reveals a set of highly interesting genes whose expression profiles correlate with genetic risk groups of unexpected survival outcomes.

Key words:

Microarray lung cancer survival analysis machine learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. Proc 8 th Inter Conf Database Theory (ICDT), 420–434.Google Scholar
  2. Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8(8):816–24.PubMedGoogle Scholar
  3. Berrar D, Downes CS, Dubitzky W (2003), Multiclass cancer classification using gene expression profiling and probabilistic neural networks. Proc Pac Symp Biocomp 8:5–16.Google Scholar
  4. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98(24):13790–13795.PubMedCrossRefGoogle Scholar
  5. Bolstad B.M., Irizarry R, Astrand M, Speed TP (2002) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–93.Google Scholar
  6. Borczuk AC, Gorenstein L, Walter KL, Assaad AA, Wang L, Powell CA (2003) Non-smallcell lung cancer molecular signatures recapitulate lung developmental pathways. Am J Path 163(5): 1949–1960.PubMedGoogle Scholar
  7. Breiman L, Friedman J, Olshen RA, Stone CJ (1984) Classification and Regression Trees. Chapman & Hall, New York.Google Scholar
  8. Brown MPS. Grundy W, Lin D, Cristianini N, Sugnet C, Furey T, Ares M, Jr., Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA. 97(1):263–267.Google Scholar
  9. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2).Google Scholar
  10. Clark SW, Fee BE, Cleveland JL (2002) Misexpression of the eyes absent family triggers the apoptotic program. J Biol Chem 277(5):3560–3567.PubMedCrossRefGoogle Scholar
  11. Denko NC, Fontana LA, Hudson KM, Sutphin PD, Raychaudhuri S, Altman R, Giaccia AJ (2003) Investigating hypoxic tumor physiology through gene expression patterns. Oncogene 22:5907–5914.PubMedCrossRefGoogle Scholar
  12. GenAtlas(May 1, 2003), Scholar
  13. He L, Liu J, Collins I, Sanford S, O’Connell B, Benham CJ, Levens D (2000) Loss of FBP function arrests cellular proliferation and extinguishes c-myc expression. EMBO J 19(5): 1034–1044.PubMedCrossRefGoogle Scholar
  14. Hu YC, Lam KY, Law S, Wong J, Srivastava G (2001) Identification of differentially expressed genes in esophageal squamous cell carcinoma (ESCC) by cDNA expression array: overexpression of Fra-l, neogenin, Id-l, and CDC25B genes in ESCC. Clin Cancer Res 2213(7):2213–2221.Google Scholar
  15. Huang E, Cheng SH, Dressman H, Pittman J, Tsou MH, Horng CF, Bild A, Iversen ES, Liao M, Chen CM, West M, Nevins JR, Huang AT (2003) Gene expression predictors of breast cancer outcomes. Lancet 361(9369):1590–1596.PubMedCrossRefGoogle Scholar
  16. Johnsen A, France J, Sy MS, Harding CV (1998) Down-regulation of the transporter for antigen presentation, proteasome subunits, and class I major histocompatibility complex in tumor cell lines. Cancer Res 58(16):3660–3667.PubMedGoogle Scholar
  17. Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679.PubMedCrossRefGoogle Scholar
  18. Kim MJ, Park BJ, Kang YS., Kim HJ, Park JH, Kang JW, Lee SW, Han JM, Lee HW, Kim S (2003) Downregulation of FUSE-binding protein and c-myc by tRNA synthetase cofactor p38 is required for lung cell differentiation. Nat Gen 34:330–336.Google Scholar
  19. Liu J, Akoulitchev S, Weber A, Ge H, Chuikov S, Libutti D, Wang XW, Conaway JW, Harris CC, Conaway RC, Reinberg D, Levens D (2001) Defective interplay of activators and repressors with TFIH in xeroderma pigmentosum. Cell 104(3):353–63.PubMedCrossRefGoogle Scholar
  20. Meyerhardt JA, Look AT, Bigner SH, Fearon ER (1997) Identification and characterization of neogenin, a DCC-related gene. Oncogene 14(10): 1129–1136.PubMedCrossRefGoogle Scholar
  21. Ochs MF, Godwin AK (2003) Microarrays in cancer: research and applications. BioTechniques 34: pp. S4–S15.Google Scholar
  22. OMIM, (May 1, 2004), The # refers to the database entry.Google Scholar
  23. Sato M, Tanaka T, Maeno T, Sando Y, Suga T, Maeno Y, Sato H, Nagai R, Kurabayashi M (2002) Inducible Expression of Endothelial PAS Domain Protein-1 by Hypoxia in Human Lung Adenocarcinoma A549 Cells. Am J Resp Cell Mol Biol 26(1): 127–134.Google Scholar
  24. Shipp MA Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8:68–74.PubMedCrossRefGoogle Scholar
  25. Specht DF(1990) Probabilistic neural networks. Neural Networks 3:109–118.Google Scholar
  26. Srinivasan K, Strickland P, Valdes A, Shin GC, Hinck L (2003) Netrin-1/neogenin interaction stabilizes multipotent progenitor cap cells during mammary gland morphogenesis. Devel. Cell 4(3):371–82.CrossRefGoogle Scholar
  27. Takahashi T, Konishi H, Kozaki K, Osada H, Saji S, Takahashi T, Takahashi T (1998) Molecular analysis of a myc antagonist, ROX/Mnt, at 17pl3.3 in human lung cancers. Jap J Cancer Res 89:347–351.Google Scholar
  28. Topham MK, Prescot SM (2001) Diacylglycerol kinase zeta regulates Ras activation by a novel mechanism. JCellBiol 152:1135–1143.Google Scholar
  29. van’t Veer LJ Dai HY van de Vijver MJ He YDD Hart AAM Mao M Peterse HL van der Kooy K Marton MJ Witteveen AT Schreiber GJ Kerkhoven RM Roberts C Linsley PS Bernards R Friend SH 2002 Gene expression profiling predicts clinical outcome of breast cancer. Nature 415530–536.Google Scholar
  30. Vielmetter J, Chen XN, Miskevich F, Lane RP, Yamakawa K, Korenberg JR, Dreyer WJ (1997) Molecular characterization of human neogenin, a DCC-related protein, and the mapping of its gene(NEO1) to chromosomal position 15q22.3-q23. Genomics 41(3):414–421.PubMedCrossRefGoogle Scholar
  31. Wang M, Lemon WJ, Liu G, Wang Y, Iraqi FA, Malkinson AM, You M (2003) Fine mapping and identification of candidate pulmonary adenoma susceptibility genes using advanced intercross lines. Cancer Res 63:3317–3324.PubMedGoogle Scholar
  32. Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1:133–143.PubMedCrossRefGoogle Scholar
  33. Zhang H, Yu CH, Singer B, Xiong M (2001) Recursive partitioning for tumor classification with gene expression microarray data. Proc Natl Acad Sci USA 98(12):6730–6735.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. Boston 2005

Authors and Affiliations

  1. 1.School of Biomedical SciencesUniversity of UlsterColeraineNorthern Ireland

Personalised recommendations