Comparison of Different Sampling Algorithms for Phenotype Prediction

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10814)


In this paper, we compare different sampling algorithms used for identifying the defective pathways in highly underdetermined phenotype prediction problems. The first algorithm (Fisher’s ratio sampler) selects the most discriminatory genes and samples the high discriminatory genetic networks according to a prior probability that it is proportional to their individual Fisher’s ratio. The second one (holdout sampler) is inspired by the bootstrapping procedure used in regression analysis and uses the minimum-scale signatures found in different random hold outs to establish the most frequently sampled genes. The third one is a pure random sampler which randomly builds networks of differentially expressed genes. In all these algorithms, the likelihood of the different networks is established via leave one out cross-validation (LOOCV), and the posterior analysis of the most frequently sampled genes serves to establish the altered biological pathways. These algorithms are compared to the results obtained via Bayesian Networks (BNs). We show the application of these algorithms to a microarray dataset concerning Triple Negative Breast Cancers. This comparison shows that the Random, Fisher’s ratio and Holdout samplers are most effective than BNs, and all provide similar insights about the genetic mechanisms that are involved in this disease. Therefore, it can be concluded that all these samplers are good alternatives to Bayesian Networks which much lower computational demands. Besides this analysis confirms the insight that the altered pathways should be independent of the sampling methodology and the classifier that is used to infer them.


Phenotype Prediction Random Holdout Holdout Sample Leave-one-out Cross-validation (LOOCV) Discriminative Genes 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    De Andrés Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Design of biomedical robots for phenotype prediction problems. J. Comput. Biol. 23(8), 678–692 (2016)CrossRefGoogle Scholar
  2. 2.
    De Andrés-Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Sensitivity analysis of gene ranking methods in phenotype prediction. J. Biomed. Inf. 64, 255–264 (2016)CrossRefGoogle Scholar
  3. 3.
    Fernández-Martínez, J.L., Fernández-Muñiz, M.Z., Tompkins, M.J.: On the topography of the cost functional in linear and nonlinear inverse problems. Geophysics 77(1), W1–W15 (2012). Scholar
  4. 4.
    Cernea, A., Fernández-Martínez, J.L., deAndrés-Galiana, E.J., Fernández-Ovies, F.J., Fernández-Muñiz, Z., Álvarez-Machancoses, O., Saligan, L.N., Sonis, S.: Sampling defective pathways in phenotype prediction problems via the Fisher’s ratio sampler. In: IWBBIO 2018 (2018)CrossRefGoogle Scholar
  5. 5.
    Saligan, L.N., Fernández-Martínez, J.L., de Andrés Galiana, E.J., Sonis, S.: Supervised classification by filter methods and recursive feature elimination predicts risk of radiotherapy-related fatigue in patients with prostate cancer. Cancer Inf. 13(141–152), 2014 (2014)Google Scholar
  6. 6.
    Fernández-Martínez, J.L., Cernea, A., deAndrés-Galiana, E.J., Fernández-Ovies, F.J., Fernández-Muñiz, Z., Álvarez-Machancoses, O., Saligan, L.N., Sonis, S.: Sampling defective pathways in phenotype prediction problems via the Holdout sampler. In: IWBBIO 2018 (2018)CrossRefGoogle Scholar
  7. 7.
    Jiang, X., Barmada, M.M., Visweswaran, S.: Identifying genetic interactions in genome-wide data using Bayesian networks. Genet. Epidemiol. 34(6), 575–581 (2010)CrossRefGoogle Scholar
  8. 8.
    Hageman, R.S., Leduc, M.S., Korstanje, R., Paigen, B., Churchill, G.A.: A Bayesian framework for inference of the genotype-phenotype map for segregating populations. Genetics 187(4), 1163–1170 (2011)CrossRefGoogle Scholar
  9. 9.
    McGeachie, M.J., Chang, H.H., Weiss, S.T.: CGBayesNets: conditional gaussian Bayesian network learning and inference with mixed discrete and continuous data. PLoS Comput. Biol. 10(6), e1003676 (2014)CrossRefGoogle Scholar
  10. 10.
    Su, C., Andrew, A., Karagas, M.R., Borsuk, M.E.: Using Bayesian networks to discover relations between genes, environment, and disease. BioData Mining 6, 6 (2013)CrossRefGoogle Scholar
  11. 11.
    Jézéquel, P., Loussouarn, D., Guérin-Charbonnel, C., Campion, L., et al.: Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response. Breast Cancer Res. 20(17), 43 (2015)CrossRefGoogle Scholar
  12. 12.
    Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton (1993). ISBN 0-412-04231-2CrossRefzbMATHGoogle Scholar
  13. 13.
    Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge. xxxv, 1231 p. (2009)Google Scholar
  14. 14.
    Stelzer, G., Inger, A., Olender, T., Iny-Stein, T., Dalah, I., Harel, A., et al.: GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation. OMICS 13(6), 477 (2009)CrossRefGoogle Scholar
  15. 15.
    Qin, N., Wang, C., Lu, Q., et al.: A cis-eQTL genetic variant of the cancer–testis gene CCDC116 is associated with risk of multiple cancers. Hum. Genet. 136, 987 (2017). Scholar
  16. 16.
    Oyama, T., Miyoshi, Y., Koyama, K., Nakagawa, H., Yamori, T., Ito, T., Matsuda, H., Arakawa, H., Nakamura, Y.: Isolation of a novel gene on 8p21. 3–22 whose expression is reduced significantly in human colorectal cancers with liver metastasis. Genes Chromosomes. Cancer 29, 9–15 (2000)CrossRefGoogle Scholar
  17. 17.
    Wan, M., Huang, W., Kute, T.E., Miller, L.D., Zhang, Q., Hatcher, H., Wang, J., Stovall, D.B., Russell, G.B., Cao, P.D., Deng, Z., Wang, W., Zhang, Q., Lei, M., Torti, S.V., Akman, S.A., Sui, G.: Yin Yang 1 plays an essential role in breast cancer and negatively regulates p27. Am. J. Pathol. 180(5), 2120–2133 (2012). Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Group of Inverse Problems, Optimization and Machine Learning, Department of MathematicsUniversity of OviedoOviedoSpain
  2. 2.Department of Informatics and Computer ScienceUniversity of OviedoOviedoSpain
  3. 3.National Institute of Nursing ResearchNational Institutes of HealthBethesdaUSA
  4. 4.Primary Endpoint SolutionsWatertownUSA
  5. 5.Brigham and Womens’ Hospital and the Dana-Farber Cancer InstituteBostonUSA

Personalised recommendations