Skip to main content

Network-Guided Biomarker Discovery

  • Chapter
  • First Online:
Machine Learning for Health Informatics

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9605))

Abstract

Identifying measurable genetic indicators (or biomarkers) of a specific condition of a biological system is a key element of precision medicine. Indeed it allows to tailor diagnostic, prognostic and treatment choice to individual characteristics of a patient. In machine learning terms, biomarker discovery can be framed as a feature selection problem on whole-genome data sets. However, classical feature selection methods are usually underpowered to process these data sets, which contain orders of magnitude more features than samples. This can be addressed by making the assumption that genetic features that are linked on a biological network are more likely to work jointly towards explaining the phenotype of interest. We review here three families of methods for feature selection that integrate prior knowledge in the form of networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/chagaz/scones.

  2. 2.

    https://github.com/chagaz/sfan.

References

  1. Spear, B.B., Heath-Chiozzi, M., Huff, J.: Clinical application of pharmacogenetics. Trends Mol. Med. 7(5), 201–204 (2001)

    Article  Google Scholar 

  2. Reuter, J., Spacek, D.V., Snyder, M.: High-throughput sequencing technologies. Molecular Cell 58(4), 586–597 (2015)

    Article  Google Scholar 

  3. Van Allen, E.M., Wagle, N., Levy, M.A.: Clinical analysis and interpretation of cancer genome data. J. Clin. Oncol. 31(15), 1825–1833 (2013)

    Article  Google Scholar 

  4. Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., et al.: Finding the missing heritability of complex diseases. Nature 461(7265), 747–753 (2009)

    Article  Google Scholar 

  5. Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inf. 3(2), 119–131 (2016)

    Article  Google Scholar 

  6. Hund, M., Böhm, D., Sturm, W., Sedlmair, M., et al.: Visual analytics for concept exploration in subspaces of patient groups. Brain Inf. 3(4), 233–247 (2016). doi:10.1007/s40708-016-0043-5

    Google Scholar 

  7. Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., et al.: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(Database issue), D447–452 (2015)

    Article  Google Scholar 

  8. Chatr-Aryamontri, A., Breitkreutz, B.J., Oughtred, R., Boucher, L., Heinicke, S., et al.: The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43(Database issue), D470–478 (2015)

    Article  Google Scholar 

  9. Kuperstein, I., Bonnet, E., Nguyen, H.A., Cohen, D., et al.: Atlas of cancer signalling network: a systems biology resource for integrative analysis of cancer data with Google Maps. Oncogenesis 4(7), e160 (2015)

    Article  Google Scholar 

  10. Azencott, C.A., Grimm, D., Sugiyama, M., Kawahara, Y., Borgwardt, K.M.: Efficient network-guided multi-locus association mapping with graph cuts. Bioinformatics 29(13), i171–i179 (2013)

    Article  Google Scholar 

  11. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  12. Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)

    MATH  Google Scholar 

  13. Bush, W.S., Moore, J.H.: Chapter 11: genome-wide association studies. PLoS Comput. Biol. 8(12), e1002822 (2012)

    Article  Google Scholar 

  14. Merris, R.: Laplacian matrices of graphs: a survey. Linear Algebra Appl. 197, 143–176 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  15. Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT-Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45167-9_12

    Chapter  Google Scholar 

  16. Fujishige, S.: Submodular Functions and Optimization. Elsevier, Amsterdam (2005)

    MATH  Google Scholar 

  17. Bach, F.: Learning with submodular functions: a convex optimization perspective. Found. Trends Mach. Learn. 6(2–3), 145–373 (2013)

    Article  MATH  Google Scholar 

  18. Thornton, T.: Statistical methods for genome-wide and sequencing association studies of complex traits in related samples. Curr. Protoc. Hum. Genet. 84, 1.28.1–1.28.9 (2015)

    Article  Google Scholar 

  19. Liu, J., Wang, K., Ma, S., Huang, J.: Accounting for linkage disequilibrium in genome-wide association studies: a penalized regression method. Statist. Interface 6(1), 99–115 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  20. Lee, S., Abecasis, G., Boehnke, M., Lin, X.: Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95(1), 5–23 (2014)

    Article  Google Scholar 

  21. Liu, J.Z., Mcrae, A.F., Nyholt, D.R., Medland, S.E., et al.: A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87(1), 139–145 (2010)

    Article  Google Scholar 

  22. Jia, P., Wang, L., Fanous, A.H., Pato, C.N., Edwards, T.L., Zhao, Z.: The International Schizophrenia Consortium: network-assisted investigation of combined causal signals from Genome-Wide Association Studies in schizophrenia. PLoS Comput. Biol. 8(7), e1002587 (2012)

    Article  Google Scholar 

  23. Chuang, H.Y., Lee, E., Liu, Y.T., Lee, D., Ideker, T.: Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 3, 140 (2007)

    Article  Google Scholar 

  24. Baranzini, S.E., Galwey, N.W., Wang, J., Khankhanian, P., et al.: Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum. Mol. Genet. 18(11), 2078–2090 (2009)

    Article  Google Scholar 

  25. Wang, L., Matsushita, T., Madireddy, L., Mousavi, P., Baranzini, S.E.: PINBPA: Cytoscape app for network analysis of GWAS data. Bioinformatics 31(2), 262–264 (2015)

    Article  Google Scholar 

  26. Ideker, T., Ozier, O., Schwikowski, B., Siegel, A.F.: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(suppl 1), S233–S240 (2002)

    Article  Google Scholar 

  27. Taşan, M., Musso, G., Hao, T., Vidal, M., MacRae, C.A., Roth, F.P.: Selecting causal genes from genome-wide association studies via functionally coherent subnetworks. Nat. Methods 12(2), 154–159 (2015)

    Google Scholar 

  28. Mitra, K., Carvunis, A.R., Ramesh, S.K., Ideker, T.: Integrative approaches for finding modular structure in biological networks. Nat. Rev. Genet. 14(10), 719–732 (2013)

    Article  Google Scholar 

  29. Akula, N., Baranova, A., Seto, D., Solka, J., et al.: A network-based approach to prioritize results from genome-wide association studies. PLoS ONE 6(9), e24220 (2011)

    Article  Google Scholar 

  30. Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37(4), 413–417 (2005)

    Article  Google Scholar 

  31. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58, 267–288 (1994)

    MathSciNet  MATH  Google Scholar 

  32. Wu, T.T., Chen, Y.F., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25(6), 714–721 (2009)

    Article  Google Scholar 

  33. Zhou, H., Sehl, M.E., Sinsheimer, J.S., Lange, K.: Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26(19), 2375–2382 (2010)

    Article  Google Scholar 

  34. Chen, L.S., Hutter, C.M., Potter, J.D., Liu, Y., Prentice, R.L., Peters, U., Hsu, L.: Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am. J. Hum. Genet. 86(6), 860–871 (2010)

    Article  Google Scholar 

  35. Zhao, J., Gupta, S., Seielstad, M., Liu, J., Thalamuthu, A.: Pathway-based analysis using reduced gene subsets in genome-wide association studies. BMC Bioinf. 12, 17 (2011)

    Article  Google Scholar 

  36. Silver, M., Montana, G.: Alzheimer’s disease neuroimaging initiative: fast identification of biological pathways associated with a quantitative trait using group lasso with overlaps. Stat. Appl. Genet. Mol. Biol. 11(1), 7 (2012)

    MathSciNet  MATH  Google Scholar 

  37. Huang, J., Zhang, T., Metaxas, D.: Learning with structured sparsity. J. Mach. Learn. Res. 12, 3371–3412 (2011)

    MathSciNet  MATH  Google Scholar 

  38. Micchelli, C.A., Morales, J.M., Pontil, M.: Regularizers for structured sparsity. Adv. Comput. Math. 38(3), 455–489 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  39. Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440. ACM (2009)

    Google Scholar 

  40. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. B 67(1), 91–108 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  41. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  42. Xin, B., Kawahara, Y., Wang, Y., Gao, W.: Efficient generalized fused lasso and its application to the diagnosis of Alzheimer’s disease. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)

    Google Scholar 

  43. Li, C., Li, H.: Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9), 1175–1182 (2008)

    Article  Google Scholar 

  44. Li, C., Li, H.: Variable selection and regression analysis for graph-structured covariates with an application to genomics. Ann. Appl. Stat. 4(3), 1498–1516 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  45. Sokolov, A., Carlin, D.E., Paull, E.O., Baertsch, R., Stuart, J.M.: Pathway-based genomics prediction using generalized elastic net. PLoS Comput. Biol. 12(3), e1004790 (2016)

    Article  Google Scholar 

  46. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  47. Yang, S., Yuan, L., Lai, Y.C., Shen, X., et al.: Feature grouping and selection over an undirected graph. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 922–930. ACM (2012)

    Google Scholar 

  48. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  49. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  50. Wang, Z., Montana, G.: The graph-guided group lasso for genome-wide association studies. In: Regularization, Optimization, Kernels, and Support Vector Machines, pp. 131–157 (2014)

    Google Scholar 

  51. Dernoncourt, D., Hanczar, B., Zucker, J.D.: Analysis of feature selection stability on high dimension and small sample data. Comput. Stat. Data Anal. 71, 681–693 (2014)

    Article  MathSciNet  Google Scholar 

  52. Haury, A.C., Gestraud, P., Vert, J.P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS ONE 6(12), e28210 (2011)

    Article  Google Scholar 

  53. Kuncheva, L., Smith, C., Syed, Y., Phillips, C., Lewis, K.: Evaluation of feature ranking ensembles for high-dimensional biomedical data: a case study. In: 2012 IEEE 12th International Conference on Data Mining Workshops, pp. 49–56 (2012)

    Google Scholar 

  54. Bach, F.: Structured sparsity-inducing norms through submodular functions. In: 24th Annual Conference on Neural Information Processing Systems 2010 (2010)

    Google Scholar 

  55. Orlin, J.B.: A faster strongly polynomial time algorithm for submodular function minimization. Math. Prog. 118(2), 237–251 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  56. Greig, D.M., Porteous, B.T., Seheult, A.H.: Exact maximum a posteriori estimation for binary images. J. Roy. Stat. Soc. B 51(2), 271–279 (1989)

    Google Scholar 

  57. Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)

    Article  Google Scholar 

  58. Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., Lin, X.: Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89(1), 82–93 (2011)

    Article  Google Scholar 

  59. Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th Conference on Proceedings of the 25th IASTED International Multi-Conference: Artificial Intelligence and Applications, pp. 390–395. ACTA Press (2007)

    Google Scholar 

  60. Park, S.H., Lee, J.Y., Kim, S.: A methodology for multivariate phenotype-based genome-wide association studies to mine pleiotropic genes. BMC Syst. Biol. 5(2), 1–14 (2011)

    Google Scholar 

  61. O’Reilly, P.F., Hoggart, C.J., Pomyen, Y., Calboli, F.C.F., Elliott, P., Jarvelin, M.R., Coin, L.J.M.: MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE 7(5), e34861 (2012)

    Article  Google Scholar 

  62. Eduati, F., Mangravite, L.M., Wang, T., Tang, H., et al.: Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33(9), 933–940 (2015)

    Article  Google Scholar 

  63. Cheng, W., Zhang, X., Guo, Z., Shi, Y., Wang, W.: Graph-regularized dual lasso for robust eQTL mapping. Bioinformatics 30(12), i139–i148 (2014)

    Article  Google Scholar 

  64. Obozinski, G., Taskar, B., Jordan, M.I.: Multi-task feature selection. Technical report, UC Berkeley (2006)

    Google Scholar 

  65. Sugiyama, M., Azencott, C., Grimm, D., Kawahara, Y., Borgwardt, K.: Multi-task feature selection on multiple networks via maximum flows. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 199–207 (2014)

    Google Scholar 

  66. Kim, S., Xing, E.P.: Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 5(8), e1000587 (2009)

    Article  Google Scholar 

  67. Wang, Z., Curry, E., Montana, G.: Network-guided regression for detecting associations between DNA methylation and gene expression. Bioinformatics 30(19), 2693–2701 (2014)

    Article  Google Scholar 

  68. Fei, H., Huan, J.: Structured feature selection and task relationship inference for multi-task learning. Knowl. Inf. Syst. 35(2), 345–364 (2013)

    Article  Google Scholar 

  69. Swirszcz, G., Lozano, A.C.: Multi-level lasso for sparse multi-task regression. In: Proceedings of the 29th International Conference on Machine Learning (ICML 2012), pp. 361–368 (2012)

    Google Scholar 

  70. Bellon, V., Stoven, V., Azencott, C.A.: Multitask feature selection with task descriptors. In: Pacific Symposium on Biocomputing, vol. 21, pp. 261–272 (2016)

    Google Scholar 

  71. Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., et al.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69(1), 138–147 (2001)

    Article  Google Scholar 

  72. Larson, N.B., Jenkins, G.D., Larson, M.C., Sellers, T.A., Sellers, T.A., et al.: Kernel canonical correlation analysis for assessing genegene interactions and application to ovarian cancer. Eur. J. Hum. Genet. 22(1), 126–131 (2014)

    Article  Google Scholar 

  73. Williams, S.M., Ritchie, M.D., Phillips, J.A., Dawson, E., et al.: Multilocus analysis of hypertension: a hierarchical approach. Hum. Hered. 57(1), 28–38 (2004)

    Article  Google Scholar 

  74. Cho, Y.M., Ritchie, M.D., Moore, J.H., Park, J.Y., et al.: Multifactor-dimensionality reduction shows a two-locus interaction associated with type 2 diabetes mellitus. Diabetologia 47(3), 549–554 (2004)

    Article  Google Scholar 

  75. Niel, C., Sinoquet, C., Dina, C., Rocheleau, G.: A survey about methods dedicated to epistasis detection. J. Bioinf. Comput. Biol. 6, 285 (2015)

    Google Scholar 

  76. Yoshida, M., Koike, A.: SNPInterForest: a new method for detecting epistatic interactions. BMC Bioinf. 12(1), 469 (2011)

    Article  Google Scholar 

  77. Stephan, J., Stegle, O., Beyer, A.: A random forest approach to capture genetic effects in the presence of population structure. Nat. Commun. 6, 7432 (2015)

    Article  Google Scholar 

  78. Beam, A.L., Motsinger-Reif, A., Doyle, J.: Bayesian neural networks for detecting epistasis in genetic association studies. BMC Bioinf. 15(1), 368 (2014)

    Article  Google Scholar 

  79. Drouin, A., Giguère, S., Sagatovich, V., Déraspe, M., et al.: Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine (2014). arXiv:1412.1074 [cs, q-bio, stat]

  80. Marchand, M., Shawe-Taylor, J.: The set covering machine. J. Mach. Learn. Res. 3, 723–746 (2002)

    MathSciNet  MATH  Google Scholar 

  81. He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34(4), 215–225 (2010)

    Article  Google Scholar 

  82. Ma, S., Huang, J., Moran, M.S.: Identification of genes associated with multiple cancers via integrative analysis. BMC Genom. 10, 535 (2009)

    Article  Google Scholar 

  83. Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 803–811. ACM (2008)

    Google Scholar 

  84. Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. B 72(4), 417–473 (2010)

    Article  MathSciNet  Google Scholar 

  85. Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. Roy. Stat. Soc. B 75(1), 55–80 (2013)

    Article  MathSciNet  Google Scholar 

  86. Han, Y., Yu, L.: A variance reduction framework for stable feature selection. Stat. Anal. Data Min. 5(5), 428–445 (2012)

    Article  MathSciNet  Google Scholar 

  87. Llinares-López, F., Grimm, D.G., Bodenham, D.A., Gieraths, U., et al.: Genome-wide detection of intervals of genetic heterogeneity associated with complex traits. Bioinformatics 31(12), i240–i249 (2015)

    Article  Google Scholar 

  88. Belilovsky, E., Varoquaux, G., Blaschko, M.B.: Testing for differences in Gaussian graphical models: applications to brain connectivity. In: Lee, D.D., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29 (2016)

    Google Scholar 

  89. Tur, I., Roverato, A., Castelo, R.: Mapping eQTL networks with mixed graphical markov models. Genetics 198(4), 1377–1393 (2014)

    Article  Google Scholar 

  90. Sandhu, K., Li, G., Poh, H., Quek, Y., et al.: Large-scale functional organization of long-range chromatin interaction networks. Cell. Rep. 2(5), 1207–1219 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chloé-Agathe Azencott .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Azencott, CA. (2016). Network-Guided Biomarker Discovery. In: Holzinger, A. (eds) Machine Learning for Health Informatics. Lecture Notes in Computer Science(), vol 9605. Springer, Cham. https://doi.org/10.1007/978-3-319-50478-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50478-0_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50477-3

  • Online ISBN: 978-3-319-50478-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics