Pathway-based microarray analysis for robust disease classification
- 337 Downloads
The advent of high-throughput technology has made it possible to measure genome-wide expression profiles, thus providing a new basis for microarray-based diagnosis of disease states. Numerous methods have been proposed to identify biomarkers that can accurately discriminate between case and control classes. Many of the methods used only a subset of ranked genes in the pathway and may not be able to fully represent the classification boundaries for the two disease classes. The use of negatively correlated feature sets (NCFS) to obtain more relevant features in form of phenotype-correlated genes (PCOGs) and inferring pathway activities is proposed in this study. The two pathway activity inference schemes that use NCFS significantly improved the power of pathway markers to discriminate between two phenotypes classes in microarray expression datasets of breast cancer. In particular, the NCFS-i method provided better contrasting features for classification purposes. The improvement is consistent for all cases of pathways used, using both within- and across-dataset validations. The results show that the two proposed methods that use NCFS clearly outperformed other pathway-based classifiers in terms of both ROC area and discriminative score. That is, the identification of PCOGs within each pathway, especially NCFS-i method, helps to reduce noisy or variable measurements, leading to a high performance and more robust classifier. In summary, we have demonstrated that effective incorporation of pathway information into expression-based disease diagnosis and using NCFS can provide better discriminative and more robust models.
KeywordsMicroarray analysis Disease classification Pathway activity Negatively correlated feature sets Phenotype-correlated genes Discriminative score
The main author (PS) gratefully acknowledges the financial support from National Research Council of Thailand, School of Information Technology, King Mongkut’s University of Technology Thonburi, as well as Burapha University during his current doctorate study at King Mongkut’s University of Technology Thonburi. PS is especially thankful to Mr. Ponlavit Larpeampaisarl, who helped to implement the script in the work of PCOG identification and activity inference.
Conflict of interests
The authors declare that they have no competing interests.
- 10.Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D (2008) Inferring pathway activity toward precise disease classification. PLoS Comput Biol 4(11):e1000217. doi: 10.1371/journal.pcbi.1000217
- 12.Sootanan P, Prom-on S, Meechai A, Chan JH (2010) Microarray-based disease classification using pathway activities with negatively correlated feature sets. In: Wong KW, Mendis BSU, Bouzerdoum A (eds) 17th international conference on neural information processing, (ICONIP 2010), part II, vol 6444. LNCS, Sydney, pp 250–258Google Scholar
- 13.Pawitan Y, Bjöhle J, Amler L, Borg AL, Egyhazi S, Hall P, Han X, Holmberg L, Huang F, Klaar S, Liu ET, Miller L, Nordgren H, Ploner A, Sandelin K, Shaw PM, Smeds J, Skoog L, Wedrén S, Bergh J (2005) Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res 7(6):R953–R964CrossRefGoogle Scholar
- 14.Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EMJJ, Atkins D, Foekens JA (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365:671–679Google Scholar
- 19.Ringnér M, Peterson C (2003) Microarray-based cancer diagnosis with artificial neural networks. BioTechniques 34:S30–S35Google Scholar
- 20.McDonald JH (2009) Handbook of biological statistics, 2nd edn edn. Sparky House Publishing, Baltimore, pp 198–201Google Scholar
- 21.Esteban LM, Sanz G, López FJ, Borque Á, Vergara JM (2006) Logistic regression versus neural networks for medical data. Monografias del Seminario Matemático García de Galdeano 33:245–252Google Scholar
- 22.Stewart B (1998) Improving performance of naïve Bayes classifiers by including hidden-variables. In: Mira J, Del Pobil AP (eds) Methodology and tools in knowledge-based systems, 11th international conference on industrial and engineering applications of artificial intelligence and expert systems, IEA/AIE-98, vol I. Lecture Notes in Computer Science, vol 1415, Springer, Berlin, pp 272–280Google Scholar