Abstract
Survival analysis is tremendously powerful, and is a popular methodology for analyzing time to event models in bioinformatics. Furthermore, several of its extensions can simultaneously perform variable selection in conjunction with model estimation. While this flexibility is extremely desirable, under certain scenarios, binary class variable selection and classification methods might provide more reliable risk estimates. Synthetic simulations and real data case studies suggest that when (1) randomly censored points comprise only a small portion of data, (2) biological markers are weak, (3) it is desired to compute risk across predetermined time intervals, and (4) the assumptions of the competing time to event models are violated, binary class models tend to perform superior. In practice, it might be prudent to test both model families to guarantee adequate analysis. Here we describe the pipeline of binary class feature selection and classification for time to event risk assessment.
The authors Ali Foroughi pour and Ian Loveless contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Beirlant J, Dudewicz EJ, Györfi L, Van der Meulen EC (1997) Nonparametric entropy estimation: an overview. Int J Math Stat Sci 6(1):17–39
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57:289–300
Bishop CM (2006) Pattern recognition and machine learning, 1st edn. Springer, New York
Foroughi pour A, Dalton, LA (2017) Multiclass Bayesian feature selection. In: IEEE global conference on signal and information processing (GlobalSIP). IEEE, Piscataway, pp 725–729
Foroughi pour A, Dalton LA (2017) Integrating prior information with Bayesian feature selection. In: Proceedings of the 8th ACM international conference on bioinformatics, computational biology, and health informatics. ACM, New York, pp 610–610
Foroughi pour A, Dalton LA (2017) Robust feature selection for block covariance Bayesian models. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 2696–2700
Foroughi pour A, Dalton LA (2018) Bayesian feature selection with data integration. In: Proceedings of the 2018 IEEE global conference on signal and information processing (GlobalSIP), pp 504–508
Foroughi pour A, Dalton LA (2018) Biomarker discovery via optimal Bayesian feature filtering for structured multiclass data. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. ACM, New York, pp 331–340
Foroughi pour A, Dalton LA (2018) Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure. BMC Bioinf 19(3):R70
Foroughi pour A, Dalton LA (2018) Optimal Bayesian filtering for biomarker discovery: performance and robustness. IEEE/ACM Trans Comput Biol Bioinf 17(1):250–263
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85
Haury A-C, Gestraud P, Vert J-P (2011) The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PloS One 6(12):e28210
Ho TK (2002) A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal Appl 5(2):102–112
Hua J, Tembe WD, Dougherty ER (2009) Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognit 42(3):409–424
Huber MF, Bailey T, Durrant-Whyte H, Hanebeck UD (2008) On entropy approximation for Gaussian mixture random vectors. In: Proceedings of the 2008 IEEE international conference on multisensor fusion and integration for intelligent systems, pp 181–188
Kleinbaum DG, Klein M (2010) Survival analysis, 3rd edn. Springer, New York
Krämer A, Green J, Pollard J Jr, Tugendreich S (2013) Causal analysis approaches in ingenuity pathway analysis. Bioinformatics 30(4):523–530
Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf 7(1):S7
McCullagh and John A. Nelder (1989) Generalized Linear Models, Second Edition, Volume 37 of Chapman & Hall/CRC Monographs on Statistics & Applied Probability, London, UK
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Shedden K, Taylor JMG, Enkemann SA, Tsao M-S, Yeatman TJ, Gerald WL, Eschrich S, Jurisica I, Giordano TJ, Misek DE, Chang AC, Zhu CQ, Strumpf D, Hanash S, Shepherd FA, Ding K, Seymour L, Naoki K, Pennell N, Weir B, Verhaak R, Ladd-Acosta C, Golub T, Gruidl M, Sharma A, Szoke J, Zakowski M, Rusch V, Kris M, Viale A, Motoi N, Travis W, Conley B, Seshan VE, Meyerson M, Kuick R, Dobbin KK, Lively T, Jacobson JW, Beer DG (2008) Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 14(8):822–827
Sima C, Dougherty ER (2006) What should be expected from feature selection in small-sample settings. Bioinformatics 22(19):2430–2436
Sima C, Dougherty ER (2008) The peaking phenomenon in the presence of feature-selection. Pattern Recognit Lett 29(11):1667–1674
Simard R, L’Ecuyer P, et al (2011) Computing the two-sided Kolmogorov-Smirnov distribution. J Stat Softw 39(11):1–18
Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Academic, Boston
Wehenkel L, Geurts P, Huynh-Thu VA, Saeys Y (2012) Statistical interpretation of machine learning-based feature importance scores for biomarker discovery. Bioinformatics 28(13):1766–1774
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Foroughi Pour, A., Loveless, I., Rempala, G., Pietrzak, M. (2021). Binary Classification for Failure Risk Assessment. In: Markowitz, J. (eds) Translational Bioinformatics for Therapeutic Development. Methods in Molecular Biology, vol 2194. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0849-4_6
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0849-4_6
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0848-7
Online ISBN: 978-1-0716-0849-4
eBook Packages: Springer Protocols