Skip to main content

Comparison of Different Sampling Algorithms for Phenotype Prediction

Part of the Lecture Notes in Computer Science book series (LNBI,volume 10814)

Abstract

In this paper, we compare different sampling algorithms used for identifying the defective pathways in highly underdetermined phenotype prediction problems. The first algorithm (Fisher’s ratio sampler) selects the most discriminatory genes and samples the high discriminatory genetic networks according to a prior probability that it is proportional to their individual Fisher’s ratio. The second one (holdout sampler) is inspired by the bootstrapping procedure used in regression analysis and uses the minimum-scale signatures found in different random hold outs to establish the most frequently sampled genes. The third one is a pure random sampler which randomly builds networks of differentially expressed genes. In all these algorithms, the likelihood of the different networks is established via leave one out cross-validation (LOOCV), and the posterior analysis of the most frequently sampled genes serves to establish the altered biological pathways. These algorithms are compared to the results obtained via Bayesian Networks (BNs). We show the application of these algorithms to a microarray dataset concerning Triple Negative Breast Cancers. This comparison shows that the Random, Fisher’s ratio and Holdout samplers are most effective than BNs, and all provide similar insights about the genetic mechanisms that are involved in this disease. Therefore, it can be concluded that all these samplers are good alternatives to Bayesian Networks which much lower computational demands. Besides this analysis confirms the insight that the altered pathways should be independent of the sampling methodology and the classifier that is used to infer them.

Keywords

  • Phenotype Prediction
  • Random Holdout
  • Holdout Sample
  • Leave-one-out Cross-validation (LOOCV)
  • Discriminative Genes

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-78759-6_4
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-78759-6
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   107.00
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

References

  1. De Andrés Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Design of biomedical robots for phenotype prediction problems. J. Comput. Biol. 23(8), 678–692 (2016)

    CrossRef  Google Scholar 

  2. De Andrés-Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Sensitivity analysis of gene ranking methods in phenotype prediction. J. Biomed. Inf. 64, 255–264 (2016)

    CrossRef  Google Scholar 

  3. Fernández-Martínez, J.L., Fernández-Muñiz, M.Z., Tompkins, M.J.: On the topography of the cost functional in linear and nonlinear inverse problems. Geophysics 77(1), W1–W15 (2012). https://doi.org/10.1190/geo2011-0341.1

    CrossRef  Google Scholar 

  4. Cernea, A., Fernández-Martínez, J.L., deAndrés-Galiana, E.J., Fernández-Ovies, F.J., Fernández-Muñiz, Z., Álvarez-Machancoses, O., Saligan, L.N., Sonis, S.: Sampling defective pathways in phenotype prediction problems via the Fisher’s ratio sampler. In: IWBBIO 2018 (2018)

    CrossRef  Google Scholar 

  5. Saligan, L.N., Fernández-Martínez, J.L., de Andrés Galiana, E.J., Sonis, S.: Supervised classification by filter methods and recursive feature elimination predicts risk of radiotherapy-related fatigue in patients with prostate cancer. Cancer Inf. 13(141–152), 2014 (2014)

    Google Scholar 

  6. Fernández-Martínez, J.L., Cernea, A., deAndrés-Galiana, E.J., Fernández-Ovies, F.J., Fernández-Muñiz, Z., Álvarez-Machancoses, O., Saligan, L.N., Sonis, S.: Sampling defective pathways in phenotype prediction problems via the Holdout sampler. In: IWBBIO 2018 (2018)

    CrossRef  Google Scholar 

  7. Jiang, X., Barmada, M.M., Visweswaran, S.: Identifying genetic interactions in genome-wide data using Bayesian networks. Genet. Epidemiol. 34(6), 575–581 (2010)

    CrossRef  Google Scholar 

  8. Hageman, R.S., Leduc, M.S., Korstanje, R., Paigen, B., Churchill, G.A.: A Bayesian framework for inference of the genotype-phenotype map for segregating populations. Genetics 187(4), 1163–1170 (2011)

    CrossRef  Google Scholar 

  9. McGeachie, M.J., Chang, H.H., Weiss, S.T.: CGBayesNets: conditional gaussian Bayesian network learning and inference with mixed discrete and continuous data. PLoS Comput. Biol. 10(6), e1003676 (2014)

    CrossRef  Google Scholar 

  10. Su, C., Andrew, A., Karagas, M.R., Borsuk, M.E.: Using Bayesian networks to discover relations between genes, environment, and disease. BioData Mining 6, 6 (2013)

    CrossRef  Google Scholar 

  11. Jézéquel, P., Loussouarn, D., Guérin-Charbonnel, C., Campion, L., et al.: Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response. Breast Cancer Res. 20(17), 43 (2015)

    CrossRef  Google Scholar 

  12. Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton (1993). ISBN 0-412-04231-2

    CrossRef  MATH  Google Scholar 

  13. Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge. xxxv, 1231 p. (2009)

    Google Scholar 

  14. Stelzer, G., Inger, A., Olender, T., Iny-Stein, T., Dalah, I., Harel, A., et al.: GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation. OMICS 13(6), 477 (2009)

    CrossRef  Google Scholar 

  15. Qin, N., Wang, C., Lu, Q., et al.: A cis-eQTL genetic variant of the cancer–testis gene CCDC116 is associated with risk of multiple cancers. Hum. Genet. 136, 987 (2017). https://doi.org/10.1007/s00439-017-1827-2

    CrossRef  Google Scholar 

  16. Oyama, T., Miyoshi, Y., Koyama, K., Nakagawa, H., Yamori, T., Ito, T., Matsuda, H., Arakawa, H., Nakamura, Y.: Isolation of a novel gene on 8p21. 3–22 whose expression is reduced significantly in human colorectal cancers with liver metastasis. Genes Chromosomes. Cancer 29, 9–15 (2000)

    CrossRef  Google Scholar 

  17. Wan, M., Huang, W., Kute, T.E., Miller, L.D., Zhang, Q., Hatcher, H., Wang, J., Stovall, D.B., Russell, G.B., Cao, P.D., Deng, Z., Wang, W., Zhang, Q., Lei, M., Torti, S.V., Akman, S.A., Sui, G.: Yin Yang 1 plays an essential role in breast cancer and negatively regulates p27. Am. J. Pathol. 180(5), 2120–2133 (2012). https://doi.org/10.1016/j.ajpath.2012.01.037

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Luis Fernández-Martínez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Cernea, A. et al. (2018). Comparison of Different Sampling Algorithms for Phenotype Prediction. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10814. Springer, Cham. https://doi.org/10.1007/978-3-319-78759-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78759-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78758-9

  • Online ISBN: 978-3-319-78759-6

  • eBook Packages: Computer ScienceComputer Science (R0)