Abstract
Developing improved approaches for diagnosis, treatment, and prevention of diseases is a major goal of biomedical research. Therefore, the discovery of biomarker signatures from high-throughput “omics” data is an active research topic in the field of bioinformatics and systems medicine. A major issue is the low reproducibility and the limited biological interpretability of candidate biomarker signatures identified from high-throughput data. This impedes the use of discovered biomarker signatures into clinical applications. Currently, much focus is placed on developing strategies to improve reproducibility and interpretability. Researchers have fruitfully started to incorporate prior knowledge derived from pathways and molecular networks into the process of biomarker identification. In this chapter, after giving a general introduction to the problem of disease classification and biomarker discovery, we will review two types of network-assisted approaches: (1) approaches inferring activity scores for specific pathways which are subsequently used for classification and (2) approaches identifying subnetworks or modules of molecular networks by differential network analysis which can serve as biomarker signatures.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Vasan RS (2006) Biomarkers of cardiovascular disease molecular basis and practical considerations. Circulation 113:2335–2362
Atkinson AJ, Colburn WA, DeGruttola VG et al (2001) Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther 69:89–95
McDermott JE, Wang J, Mitchell H et al (2013) Challenges in biomarker discovery: combining expert insights with statistical analysis of complex omics data. Expert Opin Med Diagn 7:37–51
Zahurak M, Parmigiani G, Yu W et al (2007) Pre-processing {A}gilent microarray data. BMC Bioinformatics 8:142
Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32(Suppl):496–501. doi:10.1038/ng1032
Smyth GK, Speed T (2003) Normalization of c{DNA} microarray data. Methods 31:265–273
Jain AK (2010) Data clustering: 50 years beyond {K}-means. Pattern Recognit Lett 31:651–666
Cui X, Churchill GA (2003) Statistical tests for differential expression in c{DNA} microarray experiments. Genome Biol 4:210
Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Stat Sci 18:71–103
Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29. doi:10.1038/75556
Kanehisa M, Goto S, Sato Y et al (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40:D109–D114. doi:10.1093/nar/gkr988
Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517. doi:10.1093/bioinformatics/btm344
John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. Proc Elev Int Conf Mach Learn 129:121–129
Kotsiantis SB, Zaharakis ID, Pintelas PE (2007) Supervised machine learning: a review of classification techniques. Front Artif Intell Appl 160:3
Larrañaga P, Calvo B, Santana R et al (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Lee JW, Lee JB, Park M, Song SH (2005) An extensive comparison of recent classification tools applied to microarray data. Comput Stat Data Anal 48:869–885
Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Adv Artif Intell (Lect Notes Comput Sci) 1015–1021
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45:427–437. doi:10.1016/j.ipm.2009.03.002
Kohavi R (1995) A study of cross-validation and bootstrap for estimation and model selection. In: Proceedings of the 14th international joint conference on Artificial intelligence. Kaufman, Montreal, pp 1137–1143
Fung G, Rao RB, Rosales R (2008) On the dangers of cross-validation. An experimental evaluation (SIAM). In: Apte C, Park H, Wang K, Zaki MJ (eds) Proceedings of the 2008 SIAM international conference on data mining. doi:10.1137/1.9781611972788.54, pp 588–596
Cun Y, Fröhlich H (2013) Network and data integration for biomarker signature discovery via network smoothed T-statistics. PLoS One 8:e73074. doi:10.1371/journal.pone.0073074
Boulesteix A-L, Slawski M (2009) Stability and aggregation of ranked gene lists. Brief Bioinform 10:556–568. doi:10.1093/bib/bbp034
Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365:488–492
He Z, Yu W (2010) Stable feature selection for biomarker discovery. Comput Biol Chem 34:215–225. doi:10.1016/j.compbiolchem.2010.07.002
Ein-Dor L, Zuk O, Domany E (2006) Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci U S A 103:5923–5928. doi:10.1073/pnas.0601231103
Kim S-Y (2009) Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinformatics 10:147. doi:10.1186/1471-2105-10-147
Haury A-C, Jacob L, Vert J-P (2010) Increasing stability and interpretability of gene expression signatures. arXiv Prepr. arXiv1001.3109
Sanavia T, Aiolli F, Da San Martino G et al (2012) Improving biomarker list stability by integration of biological knowledge in the learning process. BMC Bioinformatics 13(Suppl 4):S22. doi:10.1186/1471-2105-13-S4-S22
Cun Y, Fröhlich H (2012) Biomarker gene signature discovery integrating network knowledge. Biology (Basel) 1:5–17. doi:10.3390/biology1010005
Croft D, Mundo AF, Haw R et al (2014) The Reactome pathway knowledgebase. Nucleic Acids Res. doi:10.1093/nar/gkt1102
Liberzon A, Subramanian A, Pinchback R et al (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27:1739–1740. doi:10.1093/bioinformatics/btr260
Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway resource list. Nucleic Acids Res 34:D504–D506. doi:10.1093/nar/gkj126
Schaefer CF, Anthony K, Krupa S et al (2009) PID: the pathway interaction database. Nucleic Acids Res. doi:10.1093/nar/gkn653
Soh D, Dong D, Guo Y, Wong L (2010) Consistency, comprehensiveness, and compatibility of pathway databases. BMC Bioinformatics 11:449. doi:10.1186/1471-2105-11-449
Stobbe MD, Jansen GA, Moerland PD, van Kampen AHC (2014) Knowledge representation in metabolic pathway databases. Brief Bioinform 15:455–470. doi:10.1093/bib/bbs060
Bauer-Mehren A, Furlong LI, Sanz F (2009) Pathway databases and tools for their exploitation: benefits, current limitations and challenges. Mol Syst Biol 5:290. doi:10.1038/msb.2009.47
Wittig U, De Beuckelaer A (2001) Analysis and comparison of metabolic pathway databases. Brief Bioinform 2:126–142. doi:10.1093/bib/2.2.126
Demir E, Cary MP, Paley S, Fukuda K, Lemer C, Vastrik I et al (2010) The BioPAX community standard for pathway data sharing. Nat Biotechnol 28(9):935–942. doi:10.1038/nbt.1666, Epub 2010 Sep 9
Walhout AJ, Vidal M (2001) High-throughput yeast two-hybrid assays for large-scale protein interaction mapping. Methods 24:297–306. doi:10.1006/meth.2001.1190
Ito T, Chiba T, Ozawa R et al (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A 98:4569–4574. doi:10.1073/pnas.061034498
Krogan NJ, Cagney G, Yu H et al (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440:637–643. doi:10.1038/nature04670
Gavin A-C, Bösche M, Krause R et al (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141–147. doi:10.1038/415141a
Pieroni E, De La Fuente Van Bentem S, Mancosu G et al (2008) Protein networking: insights into global functional organization of proteomes. Proteomics 8:799–816. doi:10.1002/pmic.200700767
Hoffmann R, Valencia A (2005) Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics. doi:10.1093/bioinformatics/bti1142
Chen H, Sharp BM (2004) Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics 5:147. doi:10.1186/1471-2105-5-147
Valencia A, Pazos F (2002) Computational methods for the prediction of protein interactions. Curr Opin Struct Biol 12:368–373. doi:10.1016/S0959-440X(02)00333-0
Szklarczyk D, Franceschini A, Kuhn M et al (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. doi:10.1093/nar/gkq973
Chatr-Aryamontri A, Breitkreutz BJ, Heinicke S et al (2013) The BioGRID interaction database: 2013 update. Nucleic Acids Res. doi:10.1093/nar/gks1158
Xenarios I, Fernandez E, Salwinski L et al (2001) DIP: the database of interacting proteins: 2001 update. Nucleic Acids Res 29:239–241. doi:10.1093/nar/28.1.289
Keshava Prasad TS, Goel R, Kandasamy K et al (2009) Human protein reference database – 2009 update. Nucleic Acids Res 37:D767–D772. doi:10.1093/nar/gkn892
Licata L, Briganti L, Peluso D et al (2012) MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. doi:10.1093/nar/gkr930
Kerrien S, Aranda B, Breuza L et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res. doi:10.1093/nar/gkr1088
Apweiler R, Bairoch A, Wu CH et al (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32:D115–D119. doi:10.1093/nar/gkh131
D’haeseleer P, Liang S, Somogyi R (2000) Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16:707–726. doi:10.1093/bioinformatics/16.8.707
Butte AJ, Kohane IS (2000) Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput 418–429. doi: 10.1142/9789814447331_0040
Faith JJ, Hayete B, Thaden JT et al (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5:e8. doi:10.1371/journal.pbio.0050008
Margolin A, Wang K, Lim WK et al (2006) Reverse engineering cellular networks. Nat Protoc 1:662–671, doi: citeulike-article-id:1224968
Schäfer J, Strimmer K (2005) An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21:754–764. doi:10.1093/bioinformatics/bti062
De la Fuente A, Bing N, Hoeschele I, Mendes P (2004) Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20:3565–3574. doi:10.1093/bioinformatics/bth445
De la Fuente A (2010) From “differential expression” to “differential networking” – identification of dysfunctional regulatory networks in diseases. Trends Genet 26:326–333. doi:10.1016/j.tig.2010.05.001
Su J, Yoon BJ, Dougherty ER (2009) Accurate and reliable cancer classification based on probabilistic inference of pathway activity. PLoS One 4:e8161. doi:10.1371/journal.pone.0008161
Zhao X-M, Guimin Q (2013) Identifying biomarkers with differential analysis. In: Shen B (ed) Bioinformatics for diagnosis, prognosis and treatment of complex diseases. Springer, Dordrecht, The Netherlands, pp 17–31
Zeng T, Sun S-Y, Wang Y et al (2013) Network biomarkers reveal dysfunctional gene regulations during disease progression. FEBS J 280:5682–5695. doi:10.1111/febs.12536
Staiger C, Cadot S, Györffy B et al (2013) Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis. Front Genet 4:289. doi:10.3389/fgene.2013.00289
Guo Z, Zhang T, Li X et al (2005) Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinformatics 6:58. doi:10.1186/1471-2105-6-58
Staiger C, Cadot S, Kooter R et al (2012) A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer. PLoS One. doi:10.1371/journal.pone.0034796
Tomfohr J, Lu J, Kepler TB (2005) Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6:225. doi:10.1186/1471-2105-6-225
Liu K-Q, Liu Z-P, Hao J-K et al (2012) Identifying dysregulated pathways in cancers from pathway interaction networks. BMC Bioinformatics 13:126. doi:10.1186/1471-2105-13-126
Bild AH, Yao G, Chang JT et al (2006) Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439:353–357. doi:10.1038/nature04296
Lee E, Chuang H-Y, Kim J-W et al (2008) Inferring pathway activity toward precise disease classification. PLoS Comput Biol 4:e1000217. doi:10.1371/journal.pcbi.1000217
Yang R, Daigle BJ, Petzold LR, Doyle FJ (2012) Core module biomarker identification with network exploration for breast cancer metastasis. BMC Bioinformatics 13:12. doi:10.1186/1471-2105-13-12
Vaske CJ, Benz SC, Sanborn JZ et al (2010) Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics. doi:10.1093/bioinformatics/btq182
Subramanian A, Tamayo P, Mootha VK et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102:15545–15550. doi:10.1073/pnas.0506580102
Tarca AL, Draghici S, Khatri P et al (2009) A novel signaling pathway impact analysis. Bioinformatics 25:75–82. doi:10.1093/bioinformatics/btn577
Haynes WA, Higdon R, Stanberry L et al (2013) Correction: differential expression analysis for pathways. PLoS Comput Biol. doi:10.1371/annotation/58cf4d21-f9b0-4292-94dd-3177f393a284
Kim S, Kon M, DeLisi C (2012) Pathway-based classification of cancer subtypes. Biol Direct 7:21. doi:10.1186/1745-6150-7-21
Pyatnitskiy M, Mazo I, Shkrob M et al (2014) Clustering gene expression regulators: new approach to disease subtyping. PLoS One. doi:10.1371/journal.pone.0084955
Chuang H-Y, Lee E, Liu Y-T et al (2007) Network-based classification of breast cancer metastasis. Mol Syst Biol 3:140. doi:10.1038/msb4100180
Gambardella G, Moretti M, de Cegli R et al (2013) Differential network analysis for the identification of condition-specific pathway activity and regulation. Bioinformatics 29:1776–1785, doi: citeulike-article-id:12415017\rdoi: 10.1093/bioinformatics/btt290
Mitra K, Carvunis A-R, Ramesh SK, Ideker T (2013) Integrative approaches for finding modular structure in biological networks. Nat Rev Genet 14:719–732. doi:10.1038/nrg3552
Ideker T, Krogan NJ (2012) Differential network biology. Mol Syst Biol. doi:10.1038/msb.2011.99
Liu X, Liu Z-P, Zhao X-M, Chen L (2012) Identifying disease genes and module biomarkers by differential interactions. J Am Med Inform Assoc 19:241–248. doi:10.1136/amiajnl-2011-000658
Wang Y-C, Chen B-S (2011) A network-based biomarker approach for molecular investigation and diagnosis of lung cancer. BMC Med Genomics 4:2
Zhang B, Li H, Riggins RB et al (2009) Differential dependency network analysis to identify condition-specific topological changes in biological networks. Bioinformatics 25:526–532. doi:10.1093/bioinformatics/btn660
Tian Y, Zhang B, Hoffman EP et al (2014) Knowledge-fused differential dependency network models for detecting significant rewiring in biological networks. BMC Syst Biol 8:87. doi:10.1186/s12918-014-0087-1
Zhang B, Wang Y (2012) Learning structural changes of Gaussian graphical models in controlled experiments. Proceedings of the twenty-first conference on uncertainty in artificial intelligence
Heckerman D, Chickering DM, Meek C et al (2000) Dependency networks for inference, collaborative filtering, and data visualization. J Mach Learn Res 1:49–75. doi:10.1162/153244301753344614
Gámez J, Mateo J, Puerta J (2006) Dependency networks based classifiers: learning models by using independence tests. Proceedings of the 3rd European workshop on probabilistic graphical models. pp 115–122
Sun S-Y, Liu Z-P, Zeng T et al (2013) Spatio-temporal analysis of type 2 diabetes mellitus based on differential expression networks. Sci Rep 3:2268. doi:10.1038/srep02268
Islam MF, Hoque MM, Banik RS et al (2013) Comparative analysis of differential network modularity in tissue specific normal and cancer protein interaction networks. J Clin Bioinforma 3:19. doi:10.1186/2043-9113-3-19
Taylor IW, Linding R, Warde-Farley D et al (2009) Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol 27:199–204. doi:10.1038/nbt.1522
Zhu Y, Shen X, Pan W (2009) Network-based support vector machine for classification of microarray samples. BMC Bioinformatics 10(Suppl 1):S21. doi:10.1186/1471-2105-10-S1-S21
Johannes M, Brase JC, Fröhlich H et al (2010) Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients. Bioinformatics 26:2136–2144. doi:10.1093/bioinformatics/btq345
Stumpf MPH, Thorne T, de Silva E et al (2008) Estimating the size of the human interactome. Proc Natl Acad Sci U S A 105:6959–6964. doi:10.1073/pnas.0708078105
Gygi SP, Rochon Y, Franza BR, Aebersold R (1999) Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19:1720–1730
Greenbaum D, Colangelo C, Williams K, Gerstein M (2003) Comparing protein abundance and mRNA expression levels on a genomic scale. Genome Biol 4:117. doi:10.1186/gb-2003-4-9-117
Meyer P, Alexopoulos LG, Bonk T et al (2011) Verification of systems biology research in the age of collaborative competition. Nat Biotechnol 29:811–815. doi:10.1038/nbt.1968
Jarchum I, Jones S (2015) DREAMing of benchmarks. Nat Biotechnol 33:49–50. doi:10.1038/nbt.3115
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this protocol
Cite this protocol
Strunz, S., Wolkenhauer, O., de la Fuente, A. (2016). Network-Assisted Disease Classification and Biomarker Discovery. In: Schmitz, U., Wolkenhauer, O. (eds) Systems Medicine. Methods in Molecular Biology, vol 1386. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3283-2_16
Download citation
DOI: https://doi.org/10.1007/978-1-4939-3283-2_16
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3282-5
Online ISBN: 978-1-4939-3283-2
eBook Packages: Springer Protocols