Skip to main content

Network-Guided Biomarker Discovery

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9605)

Abstract

Identifying measurable genetic indicators (or biomarkers) of a specific condition of a biological system is a key element of precision medicine. Indeed it allows to tailor diagnostic, prognostic and treatment choice to individual characteristics of a patient. In machine learning terms, biomarker discovery can be framed as a feature selection problem on whole-genome data sets. However, classical feature selection methods are usually underpowered to process these data sets, which contain orders of magnitude more features than samples. This can be addressed by making the assumption that genetic features that are linked on a biological network are more likely to work jointly towards explaining the phenotype of interest. We review here three families of methods for feature selection that integrate prior knowledge in the form of networks.

Keywords

  • Biological networks
  • Structured sparsity
  • Feature selection
  • Biomarker discovery

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-50478-0_16
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-50478-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    https://github.com/chagaz/scones.

  2. 2.

    https://github.com/chagaz/sfan.

References

  1. Spear, B.B., Heath-Chiozzi, M., Huff, J.: Clinical application of pharmacogenetics. Trends Mol. Med. 7(5), 201–204 (2001)

    CrossRef  Google Scholar 

  2. Reuter, J., Spacek, D.V., Snyder, M.: High-throughput sequencing technologies. Molecular Cell 58(4), 586–597 (2015)

    CrossRef  Google Scholar 

  3. Van Allen, E.M., Wagle, N., Levy, M.A.: Clinical analysis and interpretation of cancer genome data. J. Clin. Oncol. 31(15), 1825–1833 (2013)

    CrossRef  Google Scholar 

  4. Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., et al.: Finding the missing heritability of complex diseases. Nature 461(7265), 747–753 (2009)

    CrossRef  Google Scholar 

  5. Holzinger, A.: Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inf. 3(2), 119–131 (2016)

    CrossRef  Google Scholar 

  6. Hund, M., Böhm, D., Sturm, W., Sedlmair, M., et al.: Visual analytics for concept exploration in subspaces of patient groups. Brain Inf. 3(4), 233–247 (2016). doi:10.1007/s40708-016-0043-5

    Google Scholar 

  7. Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., et al.: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(Database issue), D447–452 (2015)

    CrossRef  Google Scholar 

  8. Chatr-Aryamontri, A., Breitkreutz, B.J., Oughtred, R., Boucher, L., Heinicke, S., et al.: The BioGRID interaction database: 2015 update. Nucleic Acids Res. 43(Database issue), D470–478 (2015)

    CrossRef  Google Scholar 

  9. Kuperstein, I., Bonnet, E., Nguyen, H.A., Cohen, D., et al.: Atlas of cancer signalling network: a systems biology resource for integrative analysis of cancer data with Google Maps. Oncogenesis 4(7), e160 (2015)

    CrossRef  Google Scholar 

  10. Azencott, C.A., Grimm, D., Sugiyama, M., Kawahara, Y., Borgwardt, K.M.: Efficient network-guided multi-locus association mapping with graph cuts. Bioinformatics 29(13), i171–i179 (2013)

    CrossRef  Google Scholar 

  11. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  12. Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)

    MATH  Google Scholar 

  13. Bush, W.S., Moore, J.H.: Chapter 11: genome-wide association studies. PLoS Comput. Biol. 8(12), e1002822 (2012)

    CrossRef  Google Scholar 

  14. Merris, R.: Laplacian matrices of graphs: a survey. Linear Algebra Appl. 197, 143–176 (1994)

    CrossRef  MathSciNet  MATH  Google Scholar 

  15. Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT-Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45167-9_12

    CrossRef  Google Scholar 

  16. Fujishige, S.: Submodular Functions and Optimization. Elsevier, Amsterdam (2005)

    MATH  Google Scholar 

  17. Bach, F.: Learning with submodular functions: a convex optimization perspective. Found. Trends Mach. Learn. 6(2–3), 145–373 (2013)

    CrossRef  MATH  Google Scholar 

  18. Thornton, T.: Statistical methods for genome-wide and sequencing association studies of complex traits in related samples. Curr. Protoc. Hum. Genet. 84, 1.28.1–1.28.9 (2015)

    CrossRef  Google Scholar 

  19. Liu, J., Wang, K., Ma, S., Huang, J.: Accounting for linkage disequilibrium in genome-wide association studies: a penalized regression method. Statist. Interface 6(1), 99–115 (2013)

    CrossRef  MathSciNet  MATH  Google Scholar 

  20. Lee, S., Abecasis, G., Boehnke, M., Lin, X.: Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95(1), 5–23 (2014)

    CrossRef  Google Scholar 

  21. Liu, J.Z., Mcrae, A.F., Nyholt, D.R., Medland, S.E., et al.: A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87(1), 139–145 (2010)

    CrossRef  Google Scholar 

  22. Jia, P., Wang, L., Fanous, A.H., Pato, C.N., Edwards, T.L., Zhao, Z.: The International Schizophrenia Consortium: network-assisted investigation of combined causal signals from Genome-Wide Association Studies in schizophrenia. PLoS Comput. Biol. 8(7), e1002587 (2012)

    CrossRef  Google Scholar 

  23. Chuang, H.Y., Lee, E., Liu, Y.T., Lee, D., Ideker, T.: Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 3, 140 (2007)

    CrossRef  Google Scholar 

  24. Baranzini, S.E., Galwey, N.W., Wang, J., Khankhanian, P., et al.: Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum. Mol. Genet. 18(11), 2078–2090 (2009)

    CrossRef  Google Scholar 

  25. Wang, L., Matsushita, T., Madireddy, L., Mousavi, P., Baranzini, S.E.: PINBPA: Cytoscape app for network analysis of GWAS data. Bioinformatics 31(2), 262–264 (2015)

    CrossRef  Google Scholar 

  26. Ideker, T., Ozier, O., Schwikowski, B., Siegel, A.F.: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(suppl 1), S233–S240 (2002)

    CrossRef  Google Scholar 

  27. Taşan, M., Musso, G., Hao, T., Vidal, M., MacRae, C.A., Roth, F.P.: Selecting causal genes from genome-wide association studies via functionally coherent subnetworks. Nat. Methods 12(2), 154–159 (2015)

    Google Scholar 

  28. Mitra, K., Carvunis, A.R., Ramesh, S.K., Ideker, T.: Integrative approaches for finding modular structure in biological networks. Nat. Rev. Genet. 14(10), 719–732 (2013)

    CrossRef  Google Scholar 

  29. Akula, N., Baranova, A., Seto, D., Solka, J., et al.: A network-based approach to prioritize results from genome-wide association studies. PLoS ONE 6(9), e24220 (2011)

    CrossRef  Google Scholar 

  30. Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37(4), 413–417 (2005)

    CrossRef  Google Scholar 

  31. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58, 267–288 (1994)

    MathSciNet  MATH  Google Scholar 

  32. Wu, T.T., Chen, Y.F., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25(6), 714–721 (2009)

    CrossRef  Google Scholar 

  33. Zhou, H., Sehl, M.E., Sinsheimer, J.S., Lange, K.: Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26(19), 2375–2382 (2010)

    CrossRef  Google Scholar 

  34. Chen, L.S., Hutter, C.M., Potter, J.D., Liu, Y., Prentice, R.L., Peters, U., Hsu, L.: Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am. J. Hum. Genet. 86(6), 860–871 (2010)

    CrossRef  Google Scholar 

  35. Zhao, J., Gupta, S., Seielstad, M., Liu, J., Thalamuthu, A.: Pathway-based analysis using reduced gene subsets in genome-wide association studies. BMC Bioinf. 12, 17 (2011)

    CrossRef  Google Scholar 

  36. Silver, M., Montana, G.: Alzheimer’s disease neuroimaging initiative: fast identification of biological pathways associated with a quantitative trait using group lasso with overlaps. Stat. Appl. Genet. Mol. Biol. 11(1), 7 (2012)

    MathSciNet  MATH  Google Scholar 

  37. Huang, J., Zhang, T., Metaxas, D.: Learning with structured sparsity. J. Mach. Learn. Res. 12, 3371–3412 (2011)

    MathSciNet  MATH  Google Scholar 

  38. Micchelli, C.A., Morales, J.M., Pontil, M.: Regularizers for structured sparsity. Adv. Comput. Math. 38(3), 455–489 (2013)

    CrossRef  MathSciNet  MATH  Google Scholar 

  39. Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440. ACM (2009)

    Google Scholar 

  40. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. B 67(1), 91–108 (2005)

    CrossRef  MathSciNet  MATH  Google Scholar 

  41. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)

    CrossRef  MathSciNet  MATH  Google Scholar 

  42. Xin, B., Kawahara, Y., Wang, Y., Gao, W.: Efficient generalized fused lasso and its application to the diagnosis of Alzheimer’s disease. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)

    Google Scholar 

  43. Li, C., Li, H.: Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9), 1175–1182 (2008)

    CrossRef  Google Scholar 

  44. Li, C., Li, H.: Variable selection and regression analysis for graph-structured covariates with an application to genomics. Ann. Appl. Stat. 4(3), 1498–1516 (2010)

    CrossRef  MathSciNet  MATH  Google Scholar 

  45. Sokolov, A., Carlin, D.E., Paull, E.O., Baertsch, R., Stuart, J.M.: Pathway-based genomics prediction using generalized elastic net. PLoS Comput. Biol. 12(3), e1004790 (2016)

    CrossRef  Google Scholar 

  46. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    CrossRef  MathSciNet  MATH  Google Scholar 

  47. Yang, S., Yuan, L., Lai, Y.C., Shen, X., et al.: Feature grouping and selection over an undirected graph. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 922–930. ACM (2012)

    Google Scholar 

  48. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    CrossRef  MATH  Google Scholar 

  49. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    CrossRef  MATH  Google Scholar 

  50. Wang, Z., Montana, G.: The graph-guided group lasso for genome-wide association studies. In: Regularization, Optimization, Kernels, and Support Vector Machines, pp. 131–157 (2014)

    Google Scholar 

  51. Dernoncourt, D., Hanczar, B., Zucker, J.D.: Analysis of feature selection stability on high dimension and small sample data. Comput. Stat. Data Anal. 71, 681–693 (2014)

    CrossRef  MathSciNet  Google Scholar 

  52. Haury, A.C., Gestraud, P., Vert, J.P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLoS ONE 6(12), e28210 (2011)

    CrossRef  Google Scholar 

  53. Kuncheva, L., Smith, C., Syed, Y., Phillips, C., Lewis, K.: Evaluation of feature ranking ensembles for high-dimensional biomedical data: a case study. In: 2012 IEEE 12th International Conference on Data Mining Workshops, pp. 49–56 (2012)

    Google Scholar 

  54. Bach, F.: Structured sparsity-inducing norms through submodular functions. In: 24th Annual Conference on Neural Information Processing Systems 2010 (2010)

    Google Scholar 

  55. Orlin, J.B.: A faster strongly polynomial time algorithm for submodular function minimization. Math. Prog. 118(2), 237–251 (2009)

    CrossRef  MathSciNet  MATH  Google Scholar 

  56. Greig, D.M., Porteous, B.T., Seheult, A.H.: Exact maximum a posteriori estimation for binary images. J. Roy. Stat. Soc. B 51(2), 271–279 (1989)

    Google Scholar 

  57. Kolmogorov, V., Zabin, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 147–159 (2004)

    CrossRef  Google Scholar 

  58. Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., Lin, X.: Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89(1), 82–93 (2011)

    CrossRef  Google Scholar 

  59. Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th Conference on Proceedings of the 25th IASTED International Multi-Conference: Artificial Intelligence and Applications, pp. 390–395. ACTA Press (2007)

    Google Scholar 

  60. Park, S.H., Lee, J.Y., Kim, S.: A methodology for multivariate phenotype-based genome-wide association studies to mine pleiotropic genes. BMC Syst. Biol. 5(2), 1–14 (2011)

    Google Scholar 

  61. O’Reilly, P.F., Hoggart, C.J., Pomyen, Y., Calboli, F.C.F., Elliott, P., Jarvelin, M.R., Coin, L.J.M.: MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE 7(5), e34861 (2012)

    CrossRef  Google Scholar 

  62. Eduati, F., Mangravite, L.M., Wang, T., Tang, H., et al.: Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotechnol. 33(9), 933–940 (2015)

    CrossRef  Google Scholar 

  63. Cheng, W., Zhang, X., Guo, Z., Shi, Y., Wang, W.: Graph-regularized dual lasso for robust eQTL mapping. Bioinformatics 30(12), i139–i148 (2014)

    CrossRef  Google Scholar 

  64. Obozinski, G., Taskar, B., Jordan, M.I.: Multi-task feature selection. Technical report, UC Berkeley (2006)

    Google Scholar 

  65. Sugiyama, M., Azencott, C., Grimm, D., Kawahara, Y., Borgwardt, K.: Multi-task feature selection on multiple networks via maximum flows. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 199–207 (2014)

    Google Scholar 

  66. Kim, S., Xing, E.P.: Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet. 5(8), e1000587 (2009)

    CrossRef  Google Scholar 

  67. Wang, Z., Curry, E., Montana, G.: Network-guided regression for detecting associations between DNA methylation and gene expression. Bioinformatics 30(19), 2693–2701 (2014)

    CrossRef  Google Scholar 

  68. Fei, H., Huan, J.: Structured feature selection and task relationship inference for multi-task learning. Knowl. Inf. Syst. 35(2), 345–364 (2013)

    CrossRef  Google Scholar 

  69. Swirszcz, G., Lozano, A.C.: Multi-level lasso for sparse multi-task regression. In: Proceedings of the 29th International Conference on Machine Learning (ICML 2012), pp. 361–368 (2012)

    Google Scholar 

  70. Bellon, V., Stoven, V., Azencott, C.A.: Multitask feature selection with task descriptors. In: Pacific Symposium on Biocomputing, vol. 21, pp. 261–272 (2016)

    Google Scholar 

  71. Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., et al.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69(1), 138–147 (2001)

    CrossRef  Google Scholar 

  72. Larson, N.B., Jenkins, G.D., Larson, M.C., Sellers, T.A., Sellers, T.A., et al.: Kernel canonical correlation analysis for assessing genegene interactions and application to ovarian cancer. Eur. J. Hum. Genet. 22(1), 126–131 (2014)

    CrossRef  Google Scholar 

  73. Williams, S.M., Ritchie, M.D., Phillips, J.A., Dawson, E., et al.: Multilocus analysis of hypertension: a hierarchical approach. Hum. Hered. 57(1), 28–38 (2004)

    CrossRef  Google Scholar 

  74. Cho, Y.M., Ritchie, M.D., Moore, J.H., Park, J.Y., et al.: Multifactor-dimensionality reduction shows a two-locus interaction associated with type 2 diabetes mellitus. Diabetologia 47(3), 549–554 (2004)

    CrossRef  Google Scholar 

  75. Niel, C., Sinoquet, C., Dina, C., Rocheleau, G.: A survey about methods dedicated to epistasis detection. J. Bioinf. Comput. Biol. 6, 285 (2015)

    Google Scholar 

  76. Yoshida, M., Koike, A.: SNPInterForest: a new method for detecting epistatic interactions. BMC Bioinf. 12(1), 469 (2011)

    CrossRef  Google Scholar 

  77. Stephan, J., Stegle, O., Beyer, A.: A random forest approach to capture genetic effects in the presence of population structure. Nat. Commun. 6, 7432 (2015)

    CrossRef  Google Scholar 

  78. Beam, A.L., Motsinger-Reif, A., Doyle, J.: Bayesian neural networks for detecting epistasis in genetic association studies. BMC Bioinf. 15(1), 368 (2014)

    CrossRef  Google Scholar 

  79. Drouin, A., Giguère, S., Sagatovich, V., Déraspe, M., et al.: Learning interpretable models of phenotypes from whole genome sequences with the Set Covering Machine (2014). arXiv:1412.1074 [cs, q-bio, stat]

  80. Marchand, M., Shawe-Taylor, J.: The set covering machine. J. Mach. Learn. Res. 3, 723–746 (2002)

    MathSciNet  MATH  Google Scholar 

  81. He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34(4), 215–225 (2010)

    CrossRef  Google Scholar 

  82. Ma, S., Huang, J., Moran, M.S.: Identification of genes associated with multiple cancers via integrative analysis. BMC Genom. 10, 535 (2009)

    CrossRef  Google Scholar 

  83. Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 803–811. ACM (2008)

    Google Scholar 

  84. Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. B 72(4), 417–473 (2010)

    CrossRef  MathSciNet  Google Scholar 

  85. Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. Roy. Stat. Soc. B 75(1), 55–80 (2013)

    CrossRef  MathSciNet  Google Scholar 

  86. Han, Y., Yu, L.: A variance reduction framework for stable feature selection. Stat. Anal. Data Min. 5(5), 428–445 (2012)

    CrossRef  MathSciNet  Google Scholar 

  87. Llinares-López, F., Grimm, D.G., Bodenham, D.A., Gieraths, U., et al.: Genome-wide detection of intervals of genetic heterogeneity associated with complex traits. Bioinformatics 31(12), i240–i249 (2015)

    CrossRef  Google Scholar 

  88. Belilovsky, E., Varoquaux, G., Blaschko, M.B.: Testing for differences in Gaussian graphical models: applications to brain connectivity. In: Lee, D.D., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29 (2016)

    Google Scholar 

  89. Tur, I., Roverato, A., Castelo, R.: Mapping eQTL networks with mixed graphical markov models. Genetics 198(4), 1377–1393 (2014)

    CrossRef  Google Scholar 

  90. Sandhu, K., Li, G., Poh, H., Quek, Y., et al.: Large-scale functional organization of long-range chromatin interaction networks. Cell. Rep. 2(5), 1207–1219 (2012)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chloé-Agathe Azencott .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Azencott, CA. (2016). Network-Guided Biomarker Discovery. In: Holzinger, A. (eds) Machine Learning for Health Informatics. Lecture Notes in Computer Science(), vol 9605. Springer, Cham. https://doi.org/10.1007/978-3-319-50478-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50478-0_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50477-3

  • Online ISBN: 978-3-319-50478-0

  • eBook Packages: Computer ScienceComputer Science (R0)