A Machine Learning Pipeline for Identification of Discriminant Pathways

Abstract

Identifying the molecular pathways more prone to disruption during a pathological process is a key task in network medicine and, more generally, in systems biology. This chapter describes a pipeline that couples a machine learning solution for molecular profiling with a recent network comparison method. The pipeline can identify changes occurring between specific sub-modules of networks built in a case-control biomarker study, discriminating key groups of genes whose interactions are modified by an underlying condition. Different algorithms can be chosen to implement the workflow steps. Three applications on genome-wide data are presented regarding the susceptibility of children to air pollution, and early and late onset of Parkinsonʼs and Alzheimerʼs diseases.

Keywords

Gene Ontology Feature Selection Method Feature Ranking Calmodulin Binding Feature Selection Step 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Abbreviations

β-amyloid peptide

AD

Alzheimerʼs disease

ADCA

autosomal dominant cerebellar ataxias

ARACNE

algorithm for the reconstruction of accurate cellular network

BP

biological process

CLC

chloride channel

CV

cross validation

EC

entorhinal cortex

EGFR

epidermal growth factor receptor

GEO

gene expression omnibus

GO

gene ontology

GSEA

gene set enrichment analysis

GTP

guanosine triphosphate

HWHM

half-width at half-maximum

Hip

hippocampus

KEGG

Kyoto encyclopedia for genes and genomes

MAPK

mitogen-activated protein kinase

MF

molecular function

MTG

middle temporal gyrus

MiNET

mutual information networks package

NTRK

neurotrophic tyrosine receptor kinase

PC

posterior cingulate cortex

PD

Parkinsonʼs disease

PGD2

prostaglandin D2

PGH2

prostaglandin H2

RLS

recursive least-squares

SFG

superior frontal gyrus

SNP

single-nucleotide polymorphism

SRDA

spectral regression discriminant analysis

TGF

transforming growth factor

TNF

tumor necrosis factor

VCL

vinculin

VCX

primary visual cortex

WGCN

weighted gene co-expression network

WGCNA

weighted gene co-expression network analysis

References

  1. 53.1.
    A.L. Barabasi, N. Gulbahce, J. Loscalzo: Network medicine: A network-based approach to human disease, Nat. Rev. Genet. 12, 56–68 (2011)CrossRefGoogle Scholar
  2. 53.2.
    S. Strogatz: Exploring complex networks, Nature 410, 268–276 (2001)CrossRefGoogle Scholar
  3. 53.3.
    M. Newman: The structure and function of complex networks, SIAM Review 45, 167–256 (2003)MathSciNetCrossRefMATHGoogle Scholar
  4. 53.4.
    S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.U. Hwang: Complex networks: Structure and dynamics, Phys. Rep. 424(4–5), 175–308 (2006)MathSciNetCrossRefGoogle Scholar
  5. 53.5.
    M. Newman: Networks: An Introduction (Oxford Univ. Press, Oxford 2010)CrossRefMATHGoogle Scholar
  6. 53.6.
    M. Buchanan, G. Caldarelli, P. De Los Rios, F. Rao, M. Vendruscolo (ed.): Networks in Cell Biology (Cambridge Univ. Press, Cambridge 2010)Google Scholar
  7. 53.7.
    F. He, R. Balling, A.P. Zeng: Reverse engineering and verification of gene networks: Principles, assumptions, and limitations of present methods and future perspectives, J. Biotechnol. 144(3), 190–203 (2009)CrossRefGoogle Scholar
  8. 53.8.
    A. Baralla, W. Mentzen, A. de la Fuente: Inferring gene networks: Dream or nightmare?, Ann. N.Y. Acad. Sci. 1158, 246–256 (2009)CrossRefGoogle Scholar
  9. 53.9.
    D. Marbach, R. Prill, T. Schaffter, C. Mattiussi, D. Floreano, G. Stolovitzky: Revealing strengths and weaknesses of methods for gene network inference, PNAS 107(14), 6286–6291 (2010)CrossRefGoogle Scholar
  10. 53.10.
    R. De Smet, K. Marchal: Advantages and limitations of current network inference methods, Nat. Rev. Microbiol. 8, 717–729 (2010)Google Scholar
  11. 53.11.
    The MicroArray Quality Control Consortium, The MAQC-II Project: A comprehensive study of common practices for the development and validation of microarray-based predictive models, Nat. Biotechnol. 28(8), 827–838 (2010)CrossRefGoogle Scholar
  12. 53.12.
    B. Zhang, S. Horvath: A general framework for weighted gene co-expression network analysis, Stat. Appl. Genet. Mol. Biol. 4(1), 17 (2005)MathSciNetMATHGoogle Scholar
  13. 53.13.
    A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, J.P. Mesirov: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles, PNAS 102(43), 15545–15550 (2005)CrossRefGoogle Scholar
  14. 53.14.
    M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G. Sherlock: Gene ontology: Tool for the unification of biology, The gene ontology consortium, Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  15. 53.15.
    G. Jurman, R. Visintainer, C. Furlanello: An introduction to spectral distances in networks, Proc. WIRN 2010 (2011) pp. 227–234Google Scholar
  16. 53.16.
    M. Ipsen, A. Mikhailov: Evolutionary reconstruction of networks, Phys. Rev. E 66(4), 046109 (2002)CrossRefGoogle Scholar
  17. 53.17.
    D. Cai, X. He, J. Han, SRDA: An efficient algorithm for large-scale discriminant analysis, IEEE Trans. Knowl. Data Eng. 20, 1–12 (2008)CrossRefGoogle Scholar
  18. 53.18.
    C. De Mol, S. Mosci, M. Traskine, A. Verri: A regularized method for selecting nested groups of relevant genes from microarray data, J. Comput. Biol. 16, 1–15 (2009), . DOI 10.1089/cmb.2008.0171MathSciNetCrossRefGoogle Scholar
  19. 53.19.
    B. Zhang, S. Kirov, J. Snoddy: WebGestalt: An integrated system for exploring gene sets in various biological contexts, Nucleic Acid Res. 33, W741–W748 (2005)CrossRefGoogle Scholar
  20. 53.20.
    W. Zhao, P. Langfelder, T. Fuller, J. Dong, A. Li, S. Horvath: Weighted gene coexpression network analysis: State of the art, J. Biopharm. Stat. 20(2), 281–300 (2010)CrossRefMathSciNetGoogle Scholar
  21. 53.21.
    P. Meyer, F. Lafitte, G. Bontempi: Minet: A RBioconductor package for inferring large transcriptional networks using mutual information, BMC Bioinformatics 9(1), 461 (2008)CrossRefGoogle Scholar
  22. 53.22.
    Mlpy website: http://mlpy.fbk.eu/
  23. 53.23.
    H. Zou, T. Hastie: Regularization and variable selection via the elastic net, J. R. Stat. Soc. B 67(2), 301–320 (2005)MathSciNetCrossRefMATHGoogle Scholar
  24. 53.24.
    P. Fardin, A. Barla, S. Mosci, L. Rosasco, A. Verri, L. Varesio: The l1–l2 regularization framework unmasks the hypoxia signature hidden in the transcriptome of a set of heterogeneous neuroblastoma cell lines, BMC Genomics 10, 474 (2009), DOI 10.1186/1471-2164-10-474CrossRefGoogle Scholar
  25. 53.25.
  26. 53.26.
  27. 53.27.
    M. Kanehisa, S. Goto, KEGG: kyoto encyclopedia of genes and genomes, Nucleic Acids Res. 28(1), 27–30 (2000)CrossRefGoogle Scholar
  28. 53.28.
    A. Margolin, I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. Dalla-Favera, A. Califano: Aracne: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context, BMC Bioinformatics 7(7), S7 (2006)CrossRefGoogle Scholar
  29. 53.29.
    I. Nemenman, G. Escola, W. Hlavacek, P. Unkefer, C. Unkefer, M. Wall: Reconstruction of metabolic networks from high-throughput metabolite profiling data, Ann. N.Y. Acad. Sci. 1115, 102–115 (2007)CrossRefGoogle Scholar
  30. 53.30.
    T. Cover, J. Thomas: Elements of Information Theory (Wiley, Hoboken 1991)CrossRefMATHGoogle Scholar
  31. 53.31.
    R. Sharan, T. Ideker: Modeling cellular machinery through biological network comparison, Nat. Biotechnol. 24(4), 427–433 (2006)CrossRefGoogle Scholar
  32. 53.32.
    D. van Leeuwen, M. van Herwijnen, M. Pedersen, L. Knudsen, M. Kirsch-Volders, R. Sram, Y. Staal, E. Bajak, J. van Delft, J. Kleinjans: Genome-wide differential gene expression in children exposed to air pollution in the Czech Republic, Mutat. Res. 600(1–2), 12–22 (2006)CrossRefGoogle Scholar
  33. 53.33.
    D. van Leeuwen, M. Pedersen, P. Hendriksen, A. Boorsma, M. van Herwijnen, R. Gottschalk, M. Kirsch-Volders, L. Knudsen, R. Sram, E. Bajak, J. van Delft, J. Kleinjans: Genomic analysis suggests higher susceptibility of children to air pollution, Carcinogenesis 29(5), 977–983 (2008)CrossRefGoogle Scholar
  34. 53.34.
    C.R. Scherzer, A.C. Eklund, L.J. Morse, Z. Liao, J.J. Locascio, D. Fefer, M.A. Schwarzschild, M.G. Schlossmacher, M.A. Hauser, J.M. Vance, L.R. Sudarsky, D.G. Standaert, J.H. Growdon, R.V. Jensen, S.R. Gullans: Molecular markers of early Parkinsonʼs disease based on gene expression in blood, PNAS 104(3), 955–960 (2007)CrossRefGoogle Scholar
  35. 53.35.
    Y. Zhang, M. James, F. Middleton, R. Davis: Transcriptional analysis of multiple brain regions in Parkinsonʼs disease supports the involvement of specific protein processing, energy metabolism and signaling pathways and suggests novel disease mechanisms, Am. J. Med. Genet. B 137B, 5–16 (2005)CrossRefGoogle Scholar
  36. 53.36.
    W. Liang, T. Dunckley, T. Beach, A. Grover, D. Mastroeni, K. Ramsey, R. Caselli, W. Kukull, D. Mckeel, J. Morris, C. Hulette, D. Schmechel, E. Reiman, J. Rogers, D. Stephan: Neuronal gene expression in non-demented individuals with intermediate Alzheimerʼs disease neuropathology, Neurobiol. Aging 31, 1–16 (2010)Google Scholar
  37. 53.37.
    W. Liang, E. Reiman, J. Valla, T. Dunckley, T. Beach, A. Grover, T. Niedzielko, L. Schneider, D. Mastroeni, R. Caselli, W. Kukull, J. Morris, C. Hulette, D. Schmechel, J. Rogers, D. Stephan: Alzheimerʼs disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons, PNAS 105, 4441–4446 (2008)CrossRefGoogle Scholar
  38. 53.38.
    K.Y. Kim, M. Kovács, S. Kawamoto, J.R. Sellers, R.S. Adelstein: Disease-associated mutations and alternative splicing alter the enzymatic and motile activity of nonmuscle myosins ii-b and ii-c, J. Biol. Chem. 280(24), 22769–22775 (2005)CrossRefGoogle Scholar
  39. 53.39.
    A. Grupe, Y. Li, C. Rowland, P. Nowotny, A.L. Hinrichs, S. Smemo, J.S.K. Kauwe, T.J. Maxwell, S. Cherny, L. Doil, K. Tacey, R. van Luchene, A. Myers, F.W.D. Vrièze, M. Kaleem, P. Hollingworth, L. Jehu, C. Foy, N. Archer, G. Hamilton, P. Holmans, C.M. Morris, J. Catanese, J. Sninsky, T.J. White, J. Powell, J. Hardy, M. OʼDonovan, S. Lovestone, L. Jones, J.C. Morris, L. Thal, M. Owen, J. Williams, A. Goate: A scan of chromosome 10 identifies a novel locus showing strong association with late-onset Alzheimer disease, Am. J. Hum. Genet. 78(1), 78–88 (2006), DOI 10.1086/498851CrossRefGoogle Scholar
  40. 53.40.
    M.J. Friedman, S. Li, X.J. Li: Activation of gene transcription by heat shock protein 27 may contribute to its neuronal protection, J. Biol. Chem. 284(41), 944–951 (2009)CrossRefGoogle Scholar
  41. 53.41.
    H. Atamna, K. Boyle: Amyloid-beta peptide binds with heme to form a peroxidase: Relationship to the cytopathologies of Alzheimerʼs disease, PNAS 103(9), 3381–3386 (2006)CrossRefGoogle Scholar
  42. 53.42.
    M. Shi, J. Bradner, A.M. Hancock, K.A. Chung, J.F. Quinn, E.R. Peskind, D. Galasko, J. Jankovic, C.P. Zabetian, H.M. Kim, J.B. Leverenz, T.J. Montine, C. Ginghina, U.J. Kang, K.C. Cain, Y. Wang, J. Aasly, D. Goldstein, J. Zhang: Cerebrospinal fluid biomarkers for Parkinson disease diagnosis and progression, Ann. Neurol. 69(3), 570–580 (2011)CrossRefGoogle Scholar
  43. 53.43.
    G.N. Andrianov, A.D. Nozdrachev, I.V. Ryzhova: The role of defensins in the excitability of the peripheral vestibular system in the frog: Evidence for the presence of communication between the immune and nervous systems, Hear Res. 230(1–2), 1–8 (2007)CrossRefGoogle Scholar
  44. 53.44.
    M.A. Kurian, J. Zhen, S.Y. Cheng, Y. Li, S.R. Mordekar, P. Jardine, N.V. Morgan, E. Meyer, L. Tee, S. Pasha, E. Wassmer, S.J.R. Heales, P. Gissen, M.E.A. Reith, E.R. Maher: Homozygous loss-of-function mutations in the gene encoding the dopamine transporter are associated with infantile Parkinsonism-dystonia, J. Clin. Invest. 119(6), 1595–1603 (2009)Google Scholar
  45. 53.45.
    M.A. Lovell, B.C. Lynn, S. Xiong, J.F. Quinn, J. Kaye, W.R. Markesbery: An aberrant protein complex in csf as a biomarker of Alzheimer disease, Neurology 70(23), 2212–2218 (2008)CrossRefGoogle Scholar
  46. 53.46.
    A. Patereli, G.A. Alexiou, K. Stefanaki, M. Moschovi, I. Doussis-Anagnostopoulou, N. Prodromou, O. Karentzou: Expression of epidermal growth factor receptor and her-2 in pediatric embryonal brain tumors, Pediatr. Neurosurg. 46(3), 188–192 (2010)CrossRefGoogle Scholar
  47. 53.47.
    A. Cozza, E. Melissari, P. Iacopetti, V. Mariotti, A. Tedde, B. Nacmias, A. Conte, S. Sorbi, S. Pellegrini: SNPs in neurotrophin system genes and Alzheimerʼs disease in an Italian population, J. Alzheimers Dis. 15(1), 61–70 (2008)Google Scholar
  48. 53.48.
    D.W. Tsuang, R.G. Riekse, K.M. Purganan, A.C. David, T.J. Montine, G.D. Schellenberg, E.J. Steinbart, E.C. Petrie, T.D. Bird, J.B. Leverenz: Lewy body pathology in late-onset familial Alzheimerʼs disease: A clinicopathological case series, J. Alzheimers Dis. 9(3), 235–242 (2006)Google Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.DIBRISUniversity of GenovaGenovaItaly
  2. 2.Predictive Models for Biomedicine and EnvironmentFondazione Bruno KesslerPovoItaly
  3. 3.Predictive Models for Biomedicine and EnvironmentFondazione Bruno KesslerPovoItaly
  4. 4.DIBRISUniversity of GenovaGenovaItaly
  5. 5.Predictive Models for Biomedicine and EnvironmentFondazione Bruno KesslerPovoItaly
  6. 6.Computational Biology DepartmentFondazione Edmund MachS. Michele allʼAdigeItaly
  7. 7.Fondazione Bruno KesslerPovoItaly

Personalised recommendations