Sampling Defective Pathways in Phenotype Prediction Problems via the Holdout Sampler

  • Juan Luis Fernández-MartínezEmail author
  • Ana Cernea
  • Enrique J. deAndrés-Galiana
  • Francisco Javier Fernández-Ovies
  • Zulima Fernández-Muñiz
  • Oscar Alvarez-Machancoses
  • Leorey Saligan
  • Stephen T. Sonis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10814)


In this paper, we introduce the holdout sampler to find the defective pathways in high underdetermined phenotype prediction problems. This sampling algorithm is inspired by the bootstrapping procedure used in regression analysis to established confidence bounds. We show that working with partial information (data bags) serves to sample the linear uncertainty region in a simple regression problem, mainly along the axis of greatest uncertainty that corresponds to the smallest singular value of the system matrix. This procedure applied to a phenotype prediction problem, considered as a generalized prediction problem between the set of genetic signatures and the set of classes in which the phenotype is divided, serves to unravel the ensemble of altered pathways in the transcriptome that are involved in the disease development. The algorithm looks for the minimum-scale genetic signature in each random holdout and the likelihood (predictive accuracy) is established using the validation dataset via a nearest-neighbor classifier. The posterior analysis serves to identify the header genes that most-frequently appear in the different hold-outs and are therefore robust to a partial lack of samples. These genes are used to establish the genetic pathways and the biological processes involved in the disease progression. This algorithm is much faster, robust and simpler than Bayesian Networks. We show its application to a microarray dataset concerning a type of breast cancers with poor prognoses (TNBC).


  1. 1.
    de Andrés Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Design of biomedical robots for phenotype prediction problems. J. Comput. Biol. 23(8), 678–92 (2016)CrossRefGoogle Scholar
  2. 2.
    Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton (1993). ISBN 0-412-04231-2CrossRefzbMATHGoogle Scholar
  3. 3.
    Fernández-Martínez, J.L., Fernández-Muñiz, M.Z., Tompkins, M.J.: On the topography of the cost functional in linear and nonlinear inverse problems. Geophysics 77(1), W1–W15 (2012). Scholar
  4. 4.
    Fernández-Martínez, J.L., Pallero, J.L.G., Fernández-Muñiz, Z., Pedruelo-González, L.M.: From Bayes to Tarantola: new insights to understand uncertainty in inverse problems. J. Appl. Geophys. 98, 62–72 (2013)CrossRefGoogle Scholar
  5. 5.
    de Andrés-Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Sensitivity analysis of gene ranking methods in phenotype prediction. J. Biomed. Inform. 64, 255–264 (2016)CrossRefGoogle Scholar
  6. 6.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  7. 7.
    Jiang, X., Barmada, M.M., Visweswaran, S.: Identifying genetic interactions in genome-wide data using Bayesian networks. Genet. Epidemiol. 34(6), 575–581 (2010)CrossRefGoogle Scholar
  8. 8.
    Jézéquel, P., Loussouarn, D., Guérin-Charbonnel, C., Campion, L., et al.: Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response. Breast Cancer Res. 20(17), 43 (2015)CrossRefGoogle Scholar
  9. 9.
    Saligan, L.N., Fernández-Martínez, J.L., de Andrés Galiana, E.J., Sonis, S.: Supervised classification by filter methods and recursive feature elimination predicts risk of radiotherapy-related fatigue in patients with prostate cancer. Cancer Inform. 13(141–152), 2014 (2014)Google Scholar
  10. 10.
    Fernández-Martínez, J.L., de Andrés-Galiana, E.J., Sonis, S.: Genomic data integration in chronic lymphocytic leukemia. J. Gene Med. 19, 1–2 (2017)CrossRefGoogle Scholar
  11. 11.
    Stelzer, G., Inger, A., Olender, T., Iny-Stein, T., Dalah, I., Harel, A., et al.: GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation. OMICS 13(6), 477 (2009)CrossRefGoogle Scholar
  12. 12.
    Jeon, M., Han, J., Nam, S.J., Lee, J.E., Kim, S.: STC-1 expression is upregulated through an Akt/NF-κB-dependent pathway in triple-negative breast cancer cells. Oncol. Rep. 36(3), 1717–1722 (2016). Epub 25 July 2016CrossRefGoogle Scholar
  13. 13.
    Han, J., Jeon, M., Shin, I., Kim, S.: Elevated STC-1 augments the invasiveness of triple-negative breast cancer cells through activation of the JNK/c-Jun signaling pathway. Oncol. Rep. 36(3), 1764–71 (2016). Epub 26 July 2016CrossRefGoogle Scholar
  14. 14.
    Gong, X., Wei, W., Chen, L., Xia, Z., Yu, C.: Comprehensive analysis of long non-coding RNA expression profiles in hepatitis B virus-related hepatocellular carcinoma. Oncotarget 7(27), 42422–42430 (2016). Scholar
  15. 15.
    Huang, X., Jan, L.Y.: Targeting potassium channels in cancer. J. Cell Biol. 206(2), 151–162 (2016). Scholar
  16. 16.
    Lansu, K., Gentile, S.: Potassium channel activation inhibits proliferation of breast cancer cells by activating a senescence program. Cell Death Dis. 4, e652 (2013). Scholar
  17. 17.
    Mao, G., Jin, H., Wu, L.: DDX23-Linc00630-HDAC1 axis activates the Notch pathway to promote metastasis. Oncotarget. 8(24), 38937–38949 (2017). Scholar
  18. 18.
    Cernea, A., Fernández-Martínez, J.L., de Andrés-Galiana, E.J., Fernández-Ovies, F.J., Fernández-Muñiz, Z., Álvarez-Machancoses, O., Saligan, L., Sonis, S.: Sampling defective pathways in phenotype prediction problems via the Fisher’s ratio sampler. In: IWBBIO 2018 (2018)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Juan Luis Fernández-Martínez
    • 1
    Email author
  • Ana Cernea
    • 1
  • Enrique J. deAndrés-Galiana
    • 1
    • 2
  • Francisco Javier Fernández-Ovies
    • 1
  • Zulima Fernández-Muñiz
    • 1
  • Oscar Alvarez-Machancoses
    • 1
  • Leorey Saligan
    • 3
  • Stephen T. Sonis
    • 4
    • 5
  1. 1.Group of Inverse Problems, Optimization and Machine Learning, Department of MathematicsUniversity of OviedoOviedoSpain
  2. 2.Department of Informatics and Computer ScienceUniversity of OviedoOviedoSpain
  3. 3.National Institutes of Health, National Institute of Nursing ResearchBethesdaUSA
  4. 4.Primary Endpoint SolutionsWatertownUSA
  5. 5.Brigham and Womens’ Hospital and the Dana-Farber Cancer InstituteBostonUSA

Personalised recommendations