Skip to main content

Integration of Microarray Data for a Comparative Study of Classifiers and Identification of Marker Genes

  • Chapter
  • 585 Accesses

Abstract:

Novel diagnostic tools promise the development of patient-tailored cancer treatment. However, one major step towards individualized therapy is to use a combination of various data sources, e.g. transcriptomic, proteomic, and clinical data. We have integrated clinical data and lung cancer microarray data that were generated on two different oligonucleotide platforms. We were interested in the question whether the prediction of survival outcome benefits from the integration of clinical and transcriptomic data. In addition, we attempted to identify those genes whose expression profiles correlate with survival outcome. We applied five machine learning techniques to predict survival risk groups, and we compared the models with respect to their performance and general user acceptance. Based on quantitative and qualitative evaluation criteria, we chose decision trees as the most relevant technique for this type of analysis. Our in silico analysis corroborates the role of numerous marker genes already described in lung adenocarcinomas. In addition, our study reveals a set of highly interesting genes whose expression profiles correlate with genetic risk groups of unexpected survival outcomes.

Key words:

  • Microarray
  • lung cancer
  • survival analysis
  • machine learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/0-387-23077-7_12
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   129.00
Price excludes VAT (USA)
  • ISBN: 978-0-387-23077-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   169.99
Price excludes VAT (USA)
Hardcover Book
USD   169.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  • Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. Proc 8 th Inter Conf Database Theory (ICDT), 420–434.

    Google Scholar 

  • Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8(8):816–24.

    PubMed  CAS  Google Scholar 

  • Berrar D, Downes CS, Dubitzky W (2003), Multiclass cancer classification using gene expression profiling and probabilistic neural networks. Proc Pac Symp Biocomp 8:5–16.

    Google Scholar 

  • Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98(24):13790–13795.

    PubMed  CrossRef  CAS  Google Scholar 

  • Bolstad B.M., Irizarry R, Astrand M, Speed TP (2002) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–93.

    Google Scholar 

  • Borczuk AC, Gorenstein L, Walter KL, Assaad AA, Wang L, Powell CA (2003) Non-smallcell lung cancer molecular signatures recapitulate lung developmental pathways. Am J Path 163(5): 1949–1960.

    PubMed  CAS  Google Scholar 

  • Breiman L, Friedman J, Olshen RA, Stone CJ (1984) Classification and Regression Trees. Chapman & Hall, New York.

    Google Scholar 

  • Brown MPS. Grundy W, Lin D, Cristianini N, Sugnet C, Furey T, Ares M, Jr., Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA. 97(1):263–267.

    Google Scholar 

  • Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2).

    Google Scholar 

  • Clark SW, Fee BE, Cleveland JL (2002) Misexpression of the eyes absent family triggers the apoptotic program. J Biol Chem 277(5):3560–3567.

    PubMed  CrossRef  CAS  Google Scholar 

  • Denko NC, Fontana LA, Hudson KM, Sutphin PD, Raychaudhuri S, Altman R, Giaccia AJ (2003) Investigating hypoxic tumor physiology through gene expression patterns. Oncogene 22:5907–5914.

    PubMed  CrossRef  CAS  Google Scholar 

  • GenAtlas(May 1, 2003), http://www.dsi.univ-paris5.fr/genatlas/fiche.php?symbol=PSMB10.

    Google Scholar 

  • He L, Liu J, Collins I, Sanford S, O’Connell B, Benham CJ, Levens D (2000) Loss of FBP function arrests cellular proliferation and extinguishes c-myc expression. EMBO J 19(5): 1034–1044.

    PubMed  CrossRef  CAS  Google Scholar 

  • Hu YC, Lam KY, Law S, Wong J, Srivastava G (2001) Identification of differentially expressed genes in esophageal squamous cell carcinoma (ESCC) by cDNA expression array: overexpression of Fra-l, neogenin, Id-l, and CDC25B genes in ESCC. Clin Cancer Res 2213(7):2213–2221.

    Google Scholar 

  • Huang E, Cheng SH, Dressman H, Pittman J, Tsou MH, Horng CF, Bild A, Iversen ES, Liao M, Chen CM, West M, Nevins JR, Huang AT (2003) Gene expression predictors of breast cancer outcomes. Lancet 361(9369):1590–1596.

    PubMed  CrossRef  CAS  Google Scholar 

  • Johnsen A, France J, Sy MS, Harding CV (1998) Down-regulation of the transporter for antigen presentation, proteasome subunits, and class I major histocompatibility complex in tumor cell lines. Cancer Res 58(16):3660–3667.

    PubMed  CAS  Google Scholar 

  • Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679.

    PubMed  CrossRef  CAS  Google Scholar 

  • Kim MJ, Park BJ, Kang YS., Kim HJ, Park JH, Kang JW, Lee SW, Han JM, Lee HW, Kim S (2003) Downregulation of FUSE-binding protein and c-myc by tRNA synthetase cofactor p38 is required for lung cell differentiation. Nat Gen 34:330–336.

    CAS  Google Scholar 

  • Liu J, Akoulitchev S, Weber A, Ge H, Chuikov S, Libutti D, Wang XW, Conaway JW, Harris CC, Conaway RC, Reinberg D, Levens D (2001) Defective interplay of activators and repressors with TFIH in xeroderma pigmentosum. Cell 104(3):353–63.

    PubMed  CrossRef  CAS  Google Scholar 

  • Meyerhardt JA, Look AT, Bigner SH, Fearon ER (1997) Identification and characterization of neogenin, a DCC-related gene. Oncogene 14(10): 1129–1136.

    PubMed  CrossRef  CAS  Google Scholar 

  • Ochs MF, Godwin AK (2003) Microarrays in cancer: research and applications. BioTechniques 34: pp. S4–S15.

    Google Scholar 

  • OMIM, (May 1, 2004), http://www.ncbi.nlm.nih.gov/Omim/. The # refers to the database entry.

    Google Scholar 

  • Sato M, Tanaka T, Maeno T, Sando Y, Suga T, Maeno Y, Sato H, Nagai R, Kurabayashi M (2002) Inducible Expression of Endothelial PAS Domain Protein-1 by Hypoxia in Human Lung Adenocarcinoma A549 Cells. Am J Resp Cell Mol Biol 26(1): 127–134.

    CAS  Google Scholar 

  • Shipp MA Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8:68–74.

    PubMed  CrossRef  CAS  Google Scholar 

  • Specht DF(1990) Probabilistic neural networks. Neural Networks 3:109–118.

    Google Scholar 

  • Srinivasan K, Strickland P, Valdes A, Shin GC, Hinck L (2003) Netrin-1/neogenin interaction stabilizes multipotent progenitor cap cells during mammary gland morphogenesis. Devel. Cell 4(3):371–82.

    CAS  CrossRef  Google Scholar 

  • Takahashi T, Konishi H, Kozaki K, Osada H, Saji S, Takahashi T, Takahashi T (1998) Molecular analysis of a myc antagonist, ROX/Mnt, at 17pl3.3 in human lung cancers. Jap J Cancer Res 89:347–351.

    CAS  Google Scholar 

  • Topham MK, Prescot SM (2001) Diacylglycerol kinase zeta regulates Ras activation by a novel mechanism. JCellBiol 152:1135–1143.

    CAS  Google Scholar 

  • van’t Veer LJ Dai HY van de Vijver MJ He YDD Hart AAM Mao M Peterse HL van der Kooy K Marton MJ Witteveen AT Schreiber GJ Kerkhoven RM Roberts C Linsley PS Bernards R Friend SH 2002 Gene expression profiling predicts clinical outcome of breast cancer. Nature 415530–536.

    Google Scholar 

  • Vielmetter J, Chen XN, Miskevich F, Lane RP, Yamakawa K, Korenberg JR, Dreyer WJ (1997) Molecular characterization of human neogenin, a DCC-related protein, and the mapping of its gene(NEO1) to chromosomal position 15q22.3-q23. Genomics 41(3):414–421.

    PubMed  CrossRef  CAS  Google Scholar 

  • Wang M, Lemon WJ, Liu G, Wang Y, Iraqi FA, Malkinson AM, You M (2003) Fine mapping and identification of candidate pulmonary adenoma susceptibility genes using advanced intercross lines. Cancer Res 63:3317–3324.

    PubMed  CAS  Google Scholar 

  • Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A, Cheng C, Campana D, Wilkins D, Zhou X, Li J, Liu H, Pui CH, Evans WE, Naeve C, Wong L, Downing JR (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1:133–143.

    PubMed  CrossRef  CAS  Google Scholar 

  • Zhang H, Yu CH, Singer B, Xiong M (2001) Recursive partitioning for tumor classification with gene expression microarray data. Proc Natl Acad Sci USA 98(12):6730–6735.

    PubMed  CrossRef  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer Science + Business Media, Inc. Boston

About this chapter

Cite this chapter

Berrar, D., Sturgeon, B., Bradbury, I., Downes, C.S., Dubitzky, W. (2005). Integration of Microarray Data for a Comparative Study of Classifiers and Identification of Marker Genes. In: Shoemaker, J.S., Lin, S.M. (eds) Methods of Microarray Data Analysis. Springer, Boston, MA. https://doi.org/10.1007/0-387-23077-7_12

Download citation