Binary Classification for Failure Risk Assessment

Foroughi Pour, Ali; Loveless, Ian; Rempala, Grzegorz; Pietrzak, Maciej

doi:10.1007/978-1-0716-0849-4_6

Ali Foroughi Pour^3,4,
Ian Loveless⁵,
Grzegorz Rempala^7,8 &
…
Maciej Pietrzak⁶

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2194))

1575 Accesses
1 Citations

Abstract

Survival analysis is tremendously powerful, and is a popular methodology for analyzing time to event models in bioinformatics. Furthermore, several of its extensions can simultaneously perform variable selection in conjunction with model estimation. While this flexibility is extremely desirable, under certain scenarios, binary class variable selection and classification methods might provide more reliable risk estimates. Synthetic simulations and real data case studies suggest that when (1) randomly censored points comprise only a small portion of data, (2) biological markers are weak, (3) it is desired to compute risk across predetermined time intervals, and (4) the assumptions of the competing time to event models are violated, binary class models tend to perform superior. In practice, it might be prudent to test both model families to guarantee adequate analysis. Here we describe the pipeline of binary class feature selection and classification for time to event risk assessment.

The authors Ali Foroughi pour and Ian Loveless contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
QIAGEN Inc., https://www.qiagenbio-informatics.com/products/ingenuity-pathway-analysis.

References

Beirlant J, Dudewicz EJ, Györfi L, Van der Meulen EC (1997) Nonparametric entropy estimation: an overview. Int J Math Stat Sci 6(1):17–39
Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57:289–300
Google Scholar
Bishop CM (2006) Pattern recognition and machine learning, 1st edn. Springer, New York
Google Scholar
Foroughi pour A, Dalton, LA (2017) Multiclass Bayesian feature selection. In: IEEE global conference on signal and information processing (GlobalSIP). IEEE, Piscataway, pp 725–729
Google Scholar
Foroughi pour A, Dalton LA (2017) Integrating prior information with Bayesian feature selection. In: Proceedings of the 8th ACM international conference on bioinformatics, computational biology, and health informatics. ACM, New York, pp 610–610
Google Scholar
Foroughi pour A, Dalton LA (2017) Robust feature selection for block covariance Bayesian models. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 2696–2700
Google Scholar
Foroughi pour A, Dalton LA (2018) Bayesian feature selection with data integration. In: Proceedings of the 2018 IEEE global conference on signal and information processing (GlobalSIP), pp 504–508
Google Scholar
Foroughi pour A, Dalton LA (2018) Biomarker discovery via optimal Bayesian feature filtering for structured multiclass data. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. ACM, New York, pp 331–340
Google Scholar
Foroughi pour A, Dalton LA (2018) Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure. BMC Bioinf 19(3):R70
Google Scholar
Foroughi pour A, Dalton LA (2018) Optimal Bayesian filtering for biomarker discovery: performance and robustness. IEEE/ACM Trans Comput Biol Bioinf 17(1):250–263
Google Scholar
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85
Google Scholar
Haury A-C, Gestraud P, Vert J-P (2011) The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PloS One 6(12):e28210
Article CAS Google Scholar
Ho TK (2002) A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal Appl 5(2):102–112
Article Google Scholar
Hua J, Tembe WD, Dougherty ER (2009) Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognit 42(3):409–424
Article Google Scholar
Huber MF, Bailey T, Durrant-Whyte H, Hanebeck UD (2008) On entropy approximation for Gaussian mixture random vectors. In: Proceedings of the 2008 IEEE international conference on multisensor fusion and integration for intelligent systems, pp 181–188
Google Scholar
Kleinbaum DG, Klein M (2010) Survival analysis, 3rd edn. Springer, New York
Google Scholar
Krämer A, Green J, Pollard J Jr, Tugendreich S (2013) Causal analysis approaches in ingenuity pathway analysis. Bioinformatics 30(4):523–530
Article Google Scholar
Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437
Article CAS Google Scholar
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf 7(1):S7
Article Google Scholar
McCullagh and John A. Nelder (1989) Generalized Linear Models, Second Edition, Volume 37 of Chapman & Hall/CRC Monographs on Statistics & Applied Probability, London, UK
Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Article CAS Google Scholar
Shedden K, Taylor JMG, Enkemann SA, Tsao M-S, Yeatman TJ, Gerald WL, Eschrich S, Jurisica I, Giordano TJ, Misek DE, Chang AC, Zhu CQ, Strumpf D, Hanash S, Shepherd FA, Ding K, Seymour L, Naoki K, Pennell N, Weir B, Verhaak R, Ladd-Acosta C, Golub T, Gruidl M, Sharma A, Szoke J, Zakowski M, Rusch V, Kris M, Viale A, Motoi N, Travis W, Conley B, Seshan VE, Meyerson M, Kuick R, Dobbin KK, Lively T, Jacobson JW, Beer DG (2008) Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 14(8):822–827
Article CAS Google Scholar
Sima C, Dougherty ER (2006) What should be expected from feature selection in small-sample settings. Bioinformatics 22(19):2430–2436
Article CAS Google Scholar
Sima C, Dougherty ER (2008) The peaking phenomenon in the presence of feature-selection. Pattern Recognit Lett 29(11):1667–1674
Article Google Scholar
Simard R, L’Ecuyer P, et al (2011) Computing the two-sided Kolmogorov-Smirnov distribution. J Stat Softw 39(11):1–18
Article Google Scholar
Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Academic, Boston
Google Scholar
Wehenkel L, Geurts P, Huynh-Thu VA, Saeys Y (2012) Statistical interpretation of machine learning-based feature importance scores for biomarker discovery. Bioinformatics 28(13):1766–1774
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA
Ali Foroughi Pour
Department of Mathematics, The Ohio State University, Columbus, OH, USA
Ali Foroughi Pour
College of Public Health, The Ohio State University, Columbus, OH, USA
Ian Loveless
Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
Maciej Pietrzak
Department of Mathematics, The Ohio State University, Columbus, OH, USA
Grzegorz Rempala
College of Public Health, The Ohio State University, Columbus, OH, USA
Grzegorz Rempala

Authors

Ali Foroughi Pour
View author publications
You can also search for this author in PubMed Google Scholar
Ian Loveless
View author publications
You can also search for this author in PubMed Google Scholar
Grzegorz Rempala
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Pietrzak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maciej Pietrzak .

Editor information

Editors and Affiliations

Department of Cutaneous Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
Joseph Markowitz

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Foroughi Pour, A., Loveless, I., Rempala, G., Pietrzak, M. (2021). Binary Classification for Failure Risk Assessment. In: Markowitz, J. (eds) Translational Bioinformatics for Therapeutic Development. Methods in Molecular Biology, vol 2194. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0849-4_6

Download citation

DOI: https://doi.org/10.1007/978-1-0716-0849-4_6
Published: 15 September 2020
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0848-7
Online ISBN: 978-1-0716-0849-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics