Neural Computing and Applications

, Volume 21, Issue 4, pp 649–660 | Cite as

Pathway-based microarray analysis for robust disease classification

  • Pitak Sootanan
  • Santitham Prom-on
  • Asawin Meechai
  • Jonathan H. Chan
ICONIP2010

Abstract

The advent of high-throughput technology has made it possible to measure genome-wide expression profiles, thus providing a new basis for microarray-based diagnosis of disease states. Numerous methods have been proposed to identify biomarkers that can accurately discriminate between case and control classes. Many of the methods used only a subset of ranked genes in the pathway and may not be able to fully represent the classification boundaries for the two disease classes. The use of negatively correlated feature sets (NCFS) to obtain more relevant features in form of phenotype-correlated genes (PCOGs) and inferring pathway activities is proposed in this study. The two pathway activity inference schemes that use NCFS significantly improved the power of pathway markers to discriminate between two phenotypes classes in microarray expression datasets of breast cancer. In particular, the NCFS-i method provided better contrasting features for classification purposes. The improvement is consistent for all cases of pathways used, using both within- and across-dataset validations. The results show that the two proposed methods that use NCFS clearly outperformed other pathway-based classifiers in terms of both ROC area and discriminative score. That is, the identification of PCOGs within each pathway, especially NCFS-i method, helps to reduce noisy or variable measurements, leading to a high performance and more robust classifier. In summary, we have demonstrated that effective incorporation of pathway information into expression-based disease diagnosis and using NCFS can provide better discriminative and more robust models.

Keywords

Microarray analysis Disease classification Pathway activity Negatively correlated feature sets Phenotype-correlated genes Discriminative score 

References

  1. 1.
    Golub TR, Slonim DK, Tamayo P, Huard C, Gassenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRefGoogle Scholar
  2. 2.
    Berns A (2000) Cancer: gene expression diagnosis. Nature 403:491–492CrossRefGoogle Scholar
  3. 3.
    Dupuy A, Simon RM (2007) Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst 99:147–157CrossRefGoogle Scholar
  4. 4.
    Zheng C-H, Chong Y-W, Wang H-Q (2011) Gene selection using independent variable group analysis for tumor classification. Neural Comput Appl 20:161–170. doi:10.1007/s00521-010-0513-2 CrossRefGoogle Scholar
  5. 5.
    Vogelstein B, Kinzler KW (2004) Cancer genes and the pathways they control. Nat Med 10:789–799CrossRefGoogle Scholar
  6. 6.
    Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res 36:D480–D484CrossRefGoogle Scholar
  7. 7.
    Ertel A, Verghese A, Byers SW, Ochs M, Tozeren A (2006) Pathway-specific differences between tumor cell lines and normal and tumor tissue cells. Mol Cancer 5:55CrossRefGoogle Scholar
  8. 8.
    Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C, Topol EJ, Wang Q, Rao S (2005) Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinformatics 6:58. doi:10.1186/1471-2105-6-58 CrossRefGoogle Scholar
  9. 9.
    Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi M-B, Harpole D, Lancaster JM, Berchuck A, Olson JA Jr, Marks JR, Dressman HK, West M, Nevins JR (2006) Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439:353–357CrossRefGoogle Scholar
  10. 10.
    Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D (2008) Inferring pathway activity toward precise disease classification. PLoS Comput Biol 4(11):e1000217. doi:10.1371/journal.pcbi.1000217
  11. 11.
    Kim K-J, Cho S-B (2006) Ensemble classifiers based on correlation analysis for DNA microarray classification. Neurocomputing 70:187–199CrossRefGoogle Scholar
  12. 12.
    Sootanan P, Prom-on S, Meechai A, Chan JH (2010) Microarray-based disease classification using pathway activities with negatively correlated feature sets. In: Wong KW, Mendis BSU, Bouzerdoum A (eds) 17th international conference on neural information processing, (ICONIP 2010), part II, vol 6444. LNCS, Sydney, pp 250–258Google Scholar
  13. 13.
    Pawitan Y, Bjöhle J, Amler L, Borg AL, Egyhazi S, Hall P, Han X, Holmberg L, Huang F, Klaar S, Liu ET, Miller L, Nordgren H, Ploner A, Sandelin K, Shaw PM, Smeds J, Skoog L, Wedrén S, Bergh J (2005) Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res 7(6):R953–R964CrossRefGoogle Scholar
  14. 14.
    Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EMJJ, Atkins D, Foekens JA (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365:671–679Google Scholar
  15. 15.
    Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210CrossRefGoogle Scholar
  16. 16.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18. doi:10.1145/1656274.1656278 CrossRefGoogle Scholar
  17. 17.
    Liao JG, Chin KV (2007) Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15):1945–1951CrossRefGoogle Scholar
  18. 18.
    Helman P, Veroff R, Atlas SR, Willman C (2004) A Bayesian network classification methodology for gene expression data. J Comput Biol 11(4):581–615CrossRefGoogle Scholar
  19. 19.
    Ringnér M, Peterson C (2003) Microarray-based cancer diagnosis with artificial neural networks. BioTechniques 34:S30–S35Google Scholar
  20. 20.
    McDonald JH (2009) Handbook of biological statistics, 2nd edn edn. Sparky House Publishing, Baltimore, pp 198–201Google Scholar
  21. 21.
    Esteban LM, Sanz G, López FJ, Borque Á, Vergara JM (2006) Logistic regression versus neural networks for medical data. Monografias del Seminario Matemático García de Galdeano 33:245–252Google Scholar
  22. 22.
    Stewart B (1998) Improving performance of naïve Bayes classifiers by including hidden-variables. In: Mira J, Del Pobil AP (eds) Methodology and tools in knowledge-based systems, 11th international conference on industrial and engineering applications of artificial intelligence and expert systems, IEA/AIE-98, vol I. Lecture Notes in Computer Science, vol 1415, Springer, Berlin, pp 272–280Google Scholar
  23. 23.
    Pirooznia M, Yang JY, Yang MQ, Deng Y (2008) A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics 9(Suppl 1):S13CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Pitak Sootanan
    • 1
  • Santitham Prom-on
    • 2
  • Asawin Meechai
    • 3
  • Jonathan H. Chan
    • 4
  1. 1.Individual Based Program (Bioinformatics)King Mongkut’s University of Technology ThonburiBangkokThailand
  2. 2.Department of Computer EngineeringKing Mongkut’s University of Technology ThonburiBangkokThailand
  3. 3.Department of Chemical EngineeringKing Mongkut’s University of Technology ThonburiBangkokThailand
  4. 4.School of Information TechnologyKing Mongkut’s University of Technology ThonburiBangkokThailand

Personalised recommendations