A Machine Learning Pipeline for Identification of Discriminant Pathways

  • Annalisa Barla
  • Giuseppe Jurman
  • Roberto Visintainer
  • Margherita Squillario
  • Michele Filosi
  • Samantha Riccadonna
  • Cesare Furlanello


Identifying the molecular pathways more prone to disruption during a pathological process is a key task in network medicine and, more generally, in systems biology. This chapter describes a pipeline that couples a machine learning solution for molecular profiling with a recent network comparison method. The pipeline can identify changes occurring between specific sub-modules of networks built in a case-control biomarker study, discriminating key groups of genes whose interactions are modified by an underlying condition. Different algorithms can be chosen to implement the workflow steps. Three applications on genome-wide data are presented regarding the susceptibility of children to air pollution, and early and late onset of Parkinsonʼs and Alzheimerʼs diseases.


Gene Ontology Feature Selection Method Feature Ranking Calmodulin Binding Feature Selection Step 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


β-amyloid peptide


Alzheimerʼs disease


autosomal dominant cerebellar ataxias


algorithm for the reconstruction of accurate cellular network


biological process


chloride channel


cross validation


entorhinal cortex


epidermal growth factor receptor


gene expression omnibus


gene ontology


gene set enrichment analysis


guanosine triphosphate


half-width at half-maximum




Kyoto encyclopedia for genes and genomes


mitogen-activated protein kinase


molecular function


middle temporal gyrus


mutual information networks package


neurotrophic tyrosine receptor kinase


posterior cingulate cortex


Parkinsonʼs disease


prostaglandin D2


prostaglandin H2


recursive least-squares


superior frontal gyrus


single-nucleotide polymorphism


spectral regression discriminant analysis


transforming growth factor


tumor necrosis factor




primary visual cortex


weighted gene co-expression network


weighted gene co-expression network analysis


  1. 53.1.
    A.L. Barabasi, N. Gulbahce, J. Loscalzo: Network medicine: A network-based approach to human disease, Nat. Rev. Genet. 12, 56–68 (2011)CrossRefGoogle Scholar
  2. 53.2.
    S. Strogatz: Exploring complex networks, Nature 410, 268–276 (2001)CrossRefGoogle Scholar
  3. 53.3.
    M. Newman: The structure and function of complex networks, SIAM Review 45, 167–256 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 53.4.
    S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.U. Hwang: Complex networks: Structure and dynamics, Phys. Rep. 424(4–5), 175–308 (2006)MathSciNetCrossRefGoogle Scholar
  5. 53.5.
    M. Newman: Networks: An Introduction (Oxford Univ. Press, Oxford 2010)CrossRefzbMATHGoogle Scholar
  6. 53.6.
    M. Buchanan, G. Caldarelli, P. De Los Rios, F. Rao, M. Vendruscolo (ed.): Networks in Cell Biology (Cambridge Univ. Press, Cambridge 2010)Google Scholar
  7. 53.7.
    F. He, R. Balling, A.P. Zeng: Reverse engineering and verification of gene networks: Principles, assumptions, and limitations of present methods and future perspectives, J. Biotechnol. 144(3), 190–203 (2009)CrossRefGoogle Scholar
  8. 53.8.
    A. Baralla, W. Mentzen, A. de la Fuente: Inferring gene networks: Dream or nightmare?, Ann. N.Y. Acad. Sci. 1158, 246–256 (2009)CrossRefGoogle Scholar
  9. 53.9.
    D. Marbach, R. Prill, T. Schaffter, C. Mattiussi, D. Floreano, G. Stolovitzky: Revealing strengths and weaknesses of methods for gene network inference, PNAS 107(14), 6286–6291 (2010)CrossRefGoogle Scholar
  10. 53.10.
    R. De Smet, K. Marchal: Advantages and limitations of current network inference methods, Nat. Rev. Microbiol. 8, 717–729 (2010)Google Scholar
  11. 53.11.
    The MicroArray Quality Control Consortium, The MAQC-II Project: A comprehensive study of common practices for the development and validation of microarray-based predictive models, Nat. Biotechnol. 28(8), 827–838 (2010)CrossRefGoogle Scholar
  12. 53.12.
    B. Zhang, S. Horvath: A general framework for weighted gene co-expression network analysis, Stat. Appl. Genet. Mol. Biol. 4(1), 17 (2005)MathSciNetzbMATHGoogle Scholar
  13. 53.13.
    A. Subramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, J.P. Mesirov: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles, PNAS 102(43), 15545–15550 (2005)CrossRefGoogle Scholar
  14. 53.14.
    M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G. Sherlock: Gene ontology: Tool for the unification of biology, The gene ontology consortium, Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  15. 53.15.
    G. Jurman, R. Visintainer, C. Furlanello: An introduction to spectral distances in networks, Proc. WIRN 2010 (2011) pp. 227–234Google Scholar
  16. 53.16.
    M. Ipsen, A. Mikhailov: Evolutionary reconstruction of networks, Phys. Rev. E 66(4), 046109 (2002)CrossRefGoogle Scholar
  17. 53.17.
    D. Cai, X. He, J. Han, SRDA: An efficient algorithm for large-scale discriminant analysis, IEEE Trans. Knowl. Data Eng. 20, 1–12 (2008)CrossRefGoogle Scholar
  18. 53.18.
    C. De Mol, S. Mosci, M. Traskine, A. Verri: A regularized method for selecting nested groups of relevant genes from microarray data, J. Comput. Biol. 16, 1–15 (2009), . DOI 10.1089/cmb.2008.0171MathSciNetCrossRefGoogle Scholar
  19. 53.19.
    B. Zhang, S. Kirov, J. Snoddy: WebGestalt: An integrated system for exploring gene sets in various biological contexts, Nucleic Acid Res. 33, W741–W748 (2005)CrossRefGoogle Scholar
  20. 53.20.
    W. Zhao, P. Langfelder, T. Fuller, J. Dong, A. Li, S. Horvath: Weighted gene coexpression network analysis: State of the art, J. Biopharm. Stat. 20(2), 281–300 (2010)CrossRefMathSciNetGoogle Scholar
  21. 53.21.
    P. Meyer, F. Lafitte, G. Bontempi: Minet: A RBioconductor package for inferring large transcriptional networks using mutual information, BMC Bioinformatics 9(1), 461 (2008)CrossRefGoogle Scholar
  22. 53.22.
    Mlpy website:
  23. 53.23.
    H. Zou, T. Hastie: Regularization and variable selection via the elastic net, J. R. Stat. Soc. B 67(2), 301–320 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 53.24.
    P. Fardin, A. Barla, S. Mosci, L. Rosasco, A. Verri, L. Varesio: The l1–l2 regularization framework unmasks the hypoxia signature hidden in the transcriptome of a set of heterogeneous neuroblastoma cell lines, BMC Genomics 10, 474 (2009), DOI 10.1186/1471-2164-10-474CrossRefGoogle Scholar
  25. 53.25.
  26. 53.26.
  27. 53.27.
    M. Kanehisa, S. Goto, KEGG: kyoto encyclopedia of genes and genomes, Nucleic Acids Res. 28(1), 27–30 (2000)CrossRefGoogle Scholar
  28. 53.28.
    A. Margolin, I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. Dalla-Favera, A. Califano: Aracne: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context, BMC Bioinformatics 7(7), S7 (2006)CrossRefGoogle Scholar
  29. 53.29.
    I. Nemenman, G. Escola, W. Hlavacek, P. Unkefer, C. Unkefer, M. Wall: Reconstruction of metabolic networks from high-throughput metabolite profiling data, Ann. N.Y. Acad. Sci. 1115, 102–115 (2007)CrossRefGoogle Scholar
  30. 53.30.
    T. Cover, J. Thomas: Elements of Information Theory (Wiley, Hoboken 1991)CrossRefzbMATHGoogle Scholar
  31. 53.31.
    R. Sharan, T. Ideker: Modeling cellular machinery through biological network comparison, Nat. Biotechnol. 24(4), 427–433 (2006)CrossRefGoogle Scholar
  32. 53.32.
    D. van Leeuwen, M. van Herwijnen, M. Pedersen, L. Knudsen, M. Kirsch-Volders, R. Sram, Y. Staal, E. Bajak, J. van Delft, J. Kleinjans: Genome-wide differential gene expression in children exposed to air pollution in the Czech Republic, Mutat. Res. 600(1–2), 12–22 (2006)CrossRefGoogle Scholar
  33. 53.33.
    D. van Leeuwen, M. Pedersen, P. Hendriksen, A. Boorsma, M. van Herwijnen, R. Gottschalk, M. Kirsch-Volders, L. Knudsen, R. Sram, E. Bajak, J. van Delft, J. Kleinjans: Genomic analysis suggests higher susceptibility of children to air pollution, Carcinogenesis 29(5), 977–983 (2008)CrossRefGoogle Scholar
  34. 53.34.
    C.R. Scherzer, A.C. Eklund, L.J. Morse, Z. Liao, J.J. Locascio, D. Fefer, M.A. Schwarzschild, M.G. Schlossmacher, M.A. Hauser, J.M. Vance, L.R. Sudarsky, D.G. Standaert, J.H. Growdon, R.V. Jensen, S.R. Gullans: Molecular markers of early Parkinsonʼs disease based on gene expression in blood, PNAS 104(3), 955–960 (2007)CrossRefGoogle Scholar
  35. 53.35.
    Y. Zhang, M. James, F. Middleton, R. Davis: Transcriptional analysis of multiple brain regions in Parkinsonʼs disease supports the involvement of specific protein processing, energy metabolism and signaling pathways and suggests novel disease mechanisms, Am. J. Med. Genet. B 137B, 5–16 (2005)CrossRefGoogle Scholar
  36. 53.36.
    W. Liang, T. Dunckley, T. Beach, A. Grover, D. Mastroeni, K. Ramsey, R. Caselli, W. Kukull, D. Mckeel, J. Morris, C. Hulette, D. Schmechel, E. Reiman, J. Rogers, D. Stephan: Neuronal gene expression in non-demented individuals with intermediate Alzheimerʼs disease neuropathology, Neurobiol. Aging 31, 1–16 (2010)Google Scholar
  37. 53.37.
    W. Liang, E. Reiman, J. Valla, T. Dunckley, T. Beach, A. Grover, T. Niedzielko, L. Schneider, D. Mastroeni, R. Caselli, W. Kukull, J. Morris, C. Hulette, D. Schmechel, J. Rogers, D. Stephan: Alzheimerʼs disease is associated with reduced expression of energy metabolism genes in posterior cingulate neurons, PNAS 105, 4441–4446 (2008)CrossRefGoogle Scholar
  38. 53.38.
    K.Y. Kim, M. Kovács, S. Kawamoto, J.R. Sellers, R.S. Adelstein: Disease-associated mutations and alternative splicing alter the enzymatic and motile activity of nonmuscle myosins ii-b and ii-c, J. Biol. Chem. 280(24), 22769–22775 (2005)CrossRefGoogle Scholar
  39. 53.39.
    A. Grupe, Y. Li, C. Rowland, P. Nowotny, A.L. Hinrichs, S. Smemo, J.S.K. Kauwe, T.J. Maxwell, S. Cherny, L. Doil, K. Tacey, R. van Luchene, A. Myers, F.W.D. Vrièze, M. Kaleem, P. Hollingworth, L. Jehu, C. Foy, N. Archer, G. Hamilton, P. Holmans, C.M. Morris, J. Catanese, J. Sninsky, T.J. White, J. Powell, J. Hardy, M. OʼDonovan, S. Lovestone, L. Jones, J.C. Morris, L. Thal, M. Owen, J. Williams, A. Goate: A scan of chromosome 10 identifies a novel locus showing strong association with late-onset Alzheimer disease, Am. J. Hum. Genet. 78(1), 78–88 (2006), DOI 10.1086/498851CrossRefGoogle Scholar
  40. 53.40.
    M.J. Friedman, S. Li, X.J. Li: Activation of gene transcription by heat shock protein 27 may contribute to its neuronal protection, J. Biol. Chem. 284(41), 944–951 (2009)CrossRefGoogle Scholar
  41. 53.41.
    H. Atamna, K. Boyle: Amyloid-beta peptide binds with heme to form a peroxidase: Relationship to the cytopathologies of Alzheimerʼs disease, PNAS 103(9), 3381–3386 (2006)CrossRefGoogle Scholar
  42. 53.42.
    M. Shi, J. Bradner, A.M. Hancock, K.A. Chung, J.F. Quinn, E.R. Peskind, D. Galasko, J. Jankovic, C.P. Zabetian, H.M. Kim, J.B. Leverenz, T.J. Montine, C. Ginghina, U.J. Kang, K.C. Cain, Y. Wang, J. Aasly, D. Goldstein, J. Zhang: Cerebrospinal fluid biomarkers for Parkinson disease diagnosis and progression, Ann. Neurol. 69(3), 570–580 (2011)CrossRefGoogle Scholar
  43. 53.43.
    G.N. Andrianov, A.D. Nozdrachev, I.V. Ryzhova: The role of defensins in the excitability of the peripheral vestibular system in the frog: Evidence for the presence of communication between the immune and nervous systems, Hear Res. 230(1–2), 1–8 (2007)CrossRefGoogle Scholar
  44. 53.44.
    M.A. Kurian, J. Zhen, S.Y. Cheng, Y. Li, S.R. Mordekar, P. Jardine, N.V. Morgan, E. Meyer, L. Tee, S. Pasha, E. Wassmer, S.J.R. Heales, P. Gissen, M.E.A. Reith, E.R. Maher: Homozygous loss-of-function mutations in the gene encoding the dopamine transporter are associated with infantile Parkinsonism-dystonia, J. Clin. Invest. 119(6), 1595–1603 (2009)Google Scholar
  45. 53.45.
    M.A. Lovell, B.C. Lynn, S. Xiong, J.F. Quinn, J. Kaye, W.R. Markesbery: An aberrant protein complex in csf as a biomarker of Alzheimer disease, Neurology 70(23), 2212–2218 (2008)CrossRefGoogle Scholar
  46. 53.46.
    A. Patereli, G.A. Alexiou, K. Stefanaki, M. Moschovi, I. Doussis-Anagnostopoulou, N. Prodromou, O. Karentzou: Expression of epidermal growth factor receptor and her-2 in pediatric embryonal brain tumors, Pediatr. Neurosurg. 46(3), 188–192 (2010)CrossRefGoogle Scholar
  47. 53.47.
    A. Cozza, E. Melissari, P. Iacopetti, V. Mariotti, A. Tedde, B. Nacmias, A. Conte, S. Sorbi, S. Pellegrini: SNPs in neurotrophin system genes and Alzheimerʼs disease in an Italian population, J. Alzheimers Dis. 15(1), 61–70 (2008)Google Scholar
  48. 53.48.
    D.W. Tsuang, R.G. Riekse, K.M. Purganan, A.C. David, T.J. Montine, G.D. Schellenberg, E.J. Steinbart, E.C. Petrie, T.D. Bird, J.B. Leverenz: Lewy body pathology in late-onset familial Alzheimerʼs disease: A clinicopathological case series, J. Alzheimers Dis. 9(3), 235–242 (2006)Google Scholar

Copyright information

© Springer-Verlag 2014

Authors and Affiliations

  1. 1.DIBRISUniversity of GenovaGenovaItaly
  2. 2.Predictive Models for Biomedicine and EnvironmentFondazione Bruno KesslerPovoItaly
  3. 3.Predictive Models for Biomedicine and EnvironmentFondazione Bruno KesslerPovoItaly
  4. 4.DIBRISUniversity of GenovaGenovaItaly
  5. 5.Predictive Models for Biomedicine and EnvironmentFondazione Bruno KesslerPovoItaly
  6. 6.Computational Biology DepartmentFondazione Edmund MachS. Michele allʼAdigeItaly
  7. 7.Fondazione Bruno KesslerPovoItaly

Personalised recommendations