Sampling Defective Pathways in Phenotype Prediction Problems via the Holdout Sampler
In this paper, we introduce the holdout sampler to find the defective pathways in high underdetermined phenotype prediction problems. This sampling algorithm is inspired by the bootstrapping procedure used in regression analysis to established confidence bounds. We show that working with partial information (data bags) serves to sample the linear uncertainty region in a simple regression problem, mainly along the axis of greatest uncertainty that corresponds to the smallest singular value of the system matrix. This procedure applied to a phenotype prediction problem, considered as a generalized prediction problem between the set of genetic signatures and the set of classes in which the phenotype is divided, serves to unravel the ensemble of altered pathways in the transcriptome that are involved in the disease development. The algorithm looks for the minimum-scale genetic signature in each random holdout and the likelihood (predictive accuracy) is established using the validation dataset via a nearest-neighbor classifier. The posterior analysis serves to identify the header genes that most-frequently appear in the different hold-outs and are therefore robust to a partial lack of samples. These genes are used to establish the genetic pathways and the biological processes involved in the disease progression. This algorithm is much faster, robust and simpler than Bayesian Networks. We show its application to a microarray dataset concerning a type of breast cancers with poor prognoses (TNBC).
- 9.Saligan, L.N., Fernández-Martínez, J.L., de Andrés Galiana, E.J., Sonis, S.: Supervised classification by filter methods and recursive feature elimination predicts risk of radiotherapy-related fatigue in patients with prostate cancer. Cancer Inform. 13(141–152), 2014 (2014)Google Scholar