Supervised Methods for Biomarker Detection from Microarray Experiments

Serra, Angela; Cattelani, Luca; Fratello, Michele; Fortino, Vittorio; Kinaret, Pia Anneli Sofia; Greco, Dario

doi:10.1007/978-1-0716-1839-4_8

Angela Serra^3,4,5,
Luca Cattelani^3,4,5,
Michele Fratello^3,4,5,
Vittorio Fortino⁶,
Pia Anneli Sofia Kinaret^3,4,5,7 &
…
Dario Greco^3,4,5,7

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2401))

954 Accesses
1 Altmetric

Abstract

Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Computational Biomarker Discovery

High-Throughput Approaches to Biomarker Discovery and the Challenges of Subsequent Validation

High-Throughput Approaches to Biomarker Discovery and Challenges of Subsequent Validation

References

Strimbu K, Tavel JA (2010) What are biomarkers? Curr Opin HIV AIDS 5:463–466
Article PubMed PubMed Central Google Scholar
Gupta RC (2014) Introduction. In: Biomarkers in toxicology. Elsevier, pp 3–5
Chapter Google Scholar
Califf RM (2018) Biomarker definitions and their applications. Exp Biol Med 243:213–221
Article CAS Google Scholar
Torres R, Judson-Torres RL (2019) Research techniques made simple: feature selection for biomarker discovery. J Invest Dermatol 139:2068–2074.e1
Article CAS PubMed Google Scholar
Shahrjooihaghighi A, Frigui H, Zhang X et al (2017) An ensemble feature selection method for biomarker discovery. Proc IEEE Int Symp Signal Proc Inf Tech 2017:416–421
PubMed Google Scholar
Deng X, Campagne F (2010) Introduction to the development and validation of predictive biomarker models from high-throughput data sets. Methods Mol Biol 620:435–470
Article CAS PubMed Google Scholar
McDermott JE, Wang J, Mitchell H et al (2013) Challenges in biomarker discovery: combining expert insights with statistical analysis of complex omics data. Expert Opin Med Diagn 7:37–51
Article CAS PubMed PubMed Central Google Scholar
Piatetsky-Shapiro G, Tamayo P (2003) Microarray data mining. SIGKDD Explor Newsl 5:1
Article Google Scholar
Deyati A, Younesi E, Hofmann-Apitius M et al (2013) Challenges and opportunities for oncology biomarker discovery. Drug Discov Today 18:614–624
Article CAS PubMed Google Scholar
Kinaret PAS, Serra A, Federico A et al (2020) Transcriptomics in toxicogenomics, part I: experimental design, technologies, publicly available data, and regulatory aspects. Nanomaterials 10:750
Article CAS PubMed Central Google Scholar
Federico A, Serra A, Ha MK et al (2020) Transcriptomics in toxicogenomics, part II: preprocessing and differential expression analysis for high quality data. Nanomaterials 10:903
Article CAS PubMed Central Google Scholar
Serra A, Fratello M, Cattelani L et al (2020) Transcriptomics in toxicogenomics, part III: data modelling for risk assessment. Nanomaterials 10:708
Article CAS PubMed Central Google Scholar
Serra A, Galdi P, Tagliaferri R (2018) Machine learning for bioinformatics and neuroimaging. WIREs Data Mining Knowl Discov 8:e1248
Article Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517
Article CAS PubMed Google Scholar
Hall MA, Smith LA (1998) Practical feature subset selection for machine learning. In: McDonald C (ed) Computer science ’98 proceedings of the 21st australasian computer science conference ACSC’98, Perth, 4–6 February, 1998. Springer, Berlin, pp 181–191
Google Scholar
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Fawcett T, Mishra N (eds) Proceedings, twentieth international conference on machine learning. Amer Assn for Artificial, Menlo Park, CA, pp 856–863
Google Scholar
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Bergadano F, Raedt L (eds) Machine learning: ECML-94. Springer, Berlin, pp 171–182
Chapter Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Article PubMed Google Scholar
Somol P, Pudil P, Novovičová J et al (1999) Adaptive floating search methods in feature selection. Pattern Recognit Lett 20:1157–1163
Article Google Scholar
Borboudakis G, Tsamardinos I (2019) Forward-backward selection with early dropping. J Mach Learn Res 20:276–314
Google Scholar
Sanz H, Valim C, Vegas E et al (2018) SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics 19:432
Article PubMed PubMed Central Google Scholar
Annavarapu CSR, Dara S, Banka H (2016) Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm. Excli J 15:460–473
PubMed PubMed Central Google Scholar
Chuang L-Y, Yang C-H, Li J-C et al (2012) A hybrid BPSO-CGA approach for gene selection and classification of microarray data. J Comput Biol 19:68–82
Article CAS PubMed PubMed Central Google Scholar
Fortino V, Scala G, Greco D (2020) Feature set optimization in biomarker discovery from genome-scale data. Bioinformatics 36:3393–3400
Article CAS PubMed Google Scholar
Breiman L (2001) Random forests. Machine Learn 45:5–32
Article Google Scholar
Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99:323–329
Article CAS PubMed Google Scholar
Fratello M, Tagliaferri R (2019) Decision trees and random forests. In: Encyclopedia of bioinformatics and computational biology. Elsevier, pp 374–383
Chapter Google Scholar
Hastie T (2020) Ridge regularization: an essential concept in data science. Technometrics:1–8
Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc B 58:267–288
Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Stat Soc B 67:301–320
Article Google Scholar
Larrañaga P, Calvo B, Santana R et al (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112
Article PubMed Google Scholar
Tolios A, De Las RJ, Hovig E et al (2020) Computational approaches in cancer multidrug resistance research: identification of potential biomarkers, drug targets and drug-target interactions. Drug Resist Updat 48:100662
Article CAS PubMed Google Scholar
Park H, Shiraishi Y, Imoto S et al (2017) A novel adaptive penalized logistic regression for uncovering biomarker associated with anti-cancer drug sensitivity. IEEE/ACM Trans Comput Biol Bioinform 14:771–782
Article PubMed Google Scholar
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L et al (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189–215
Article Google Scholar
Zheng D, Ding Y, Ma Q et al (2018) Identification of serum microRNAs as novel biomarkers in esophageal squamous cell carcinoma using feature selection algorithms. Front Oncol 8:674
Article PubMed Google Scholar
Su R, Liu X, Wei L et al (2019) Deep-resp-forest: a deep forest model to predict anti-cancer drug response. Methods 166:91–102
Article CAS PubMed Google Scholar
Zhou ZH, Feng J (2019) Deep forest. Natl Sci Rev 6(1):74–86
Google Scholar
Abiodun OI, Jantan A, Omolara AE et al (2018) State-of-the-art in artificial neural network applications: a survey. Heliyon 4:e00938
Article PubMed PubMed Central Google Scholar
Wang H, Liu R, Schyman P et al (2019) Deep neural network models for predicting chemically induced liver toxicity endpoints from transcriptomic responses. Front Pharmacol 10:42
Article PubMed PubMed Central Google Scholar
Raies AB, Bajic VB (2016) In silico toxicology: computational methods for the prediction of chemical toxicity. Wiley Interdiscip Rev Comput Mol Sci 6:147–172
Article CAS PubMed PubMed Central Google Scholar
Maunz A, Helma C (2008) Prediction of chemical toxicity with local support vector regression and activity-specific kernels. SAR QSAR Environ Res 19:413–431
Article CAS PubMed Google Scholar
Xu Y, Pei J, Lai L (2017) Deep learning based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction. J Chem Inf Model 57:2672–2685
Article CAS PubMed Google Scholar
Ding MQ, Chen L, Cooper GF et al (2018) Precision oncology beyond targeted therapy: combining omics data with machine learning matches the majority of cancer cells to effective therapeutics. Mol Cancer Res 16:269–278
Article CAS PubMed Google Scholar
Geeleher P, Cox NJ, Huang RS (2014) Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol 15:R47
Article PubMed PubMed Central Google Scholar
Zhang W, Tang J, Wang N (2016) Using the machine learning approach to predict patient survival from high-dimensional survival data. In: 2016 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1234–1238
Chapter Google Scholar
Tong Z, Liu Y, Ma H et al (2020) Development, validation and comparison of artificial neural network models and logistic regression models predicting survival of unresectable pancreatic cancer. Front Bioeng Biotechnol 8:196
Article PubMed PubMed Central Google Scholar
Serra A, Saarimäki LA, Fratello M et al (2020) BMDx: a graphical Shiny application to perform Benchmark Dose analysis for transcriptomics data. Bioinformatics 36:2932–2933
Article CAS PubMed Google Scholar
Kuo B, Francina Webster A, Thomas RS et al (2016) BMDExpress Data Viewer—a visualization tool to analyze BMDExpress datasets. J Appl Toxicol 36:1048–1059
Article CAS PubMed Google Scholar
Serra A, Fratello M, Del Giudice G et al (2020) TinderMIX: time-dose integrated modelling of toxicogenomics data. Gigascience 9:giaa055
Article PubMed PubMed Central Google Scholar
Saarimäki LA, Kinaret PAS, Scala G et al (2020) Toxicogenomics analysis of dynamic dose-response in macrophages highlights molecular alterations relevant for multi-walled carbon nanotube-induced lung fibrosis. NanoImpact 20:100274
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22
Article PubMed PubMed Central Google Scholar
Jang IS, Neto EC, Guinney J et al (2014) Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data. Pac Symp Biocomput:63–74
Google Scholar
Galdi P, Tagliaferri R (2019) Data mining: accuracy and error measures for classification and prediction. In: Encyclopedia of bioinformatics and computational biology. Elsevier, pp 431–436
Chapter Google Scholar
Handelman GS, Kok HK, Chandra RV et al (2019) Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. AJR Am J Roentgenol 212:38–43
Article PubMed Google Scholar
Chicco D (2017) Ten quick tips for machine learning in computational biology. BioData Min 10:35
Article PubMed PubMed Central Google Scholar
Tharwat A, Moemen YS, Hassanien AE (2016) A predictive model for toxicity effects assessment of biotransformed hepatic drugs using iterative sampling method. Sci Rep 6:38660
Article CAS PubMed PubMed Central Google Scholar
Tharwat A, Moemen YS, Hassanien AE (2017) Classification of toxicity effects of biotransformed hepatic drugs using whale optimized support vector machines. J Biomed Inform 68:132–149
Article PubMed Google Scholar
Eitrich T, Kless A, Druska C et al (2007) Classification of highly unbalanced CYP450 data of drugs using cost sensitive machine learning techniques. J Chem Inf Model 47:92–103
Article CAS PubMed Google Scholar
Lunardon N, Menardi G, Torelli N (2014) ROSE: a package for binary imbalanced learning. R J 6:79
Article Google Scholar
Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Discov 28:92–122
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: Synthetic Minority Over-sampling Technique. jair 16:321–357
Article Google Scholar
Kovács G (2019) An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput 83:105662
Article Google Scholar
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451
Article CAS PubMed Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: Data mining, inference, and prediction, second edition (2nd ed.). Springer
Google Scholar
van Gool AJ, Bietrix F, Caldenhoven E et al (2017) Bridging the translational innovation gap through good biomarker practice. Nat Rev Drug Discov 16:587–588
Article PubMed Google Scholar
McShane LM, Cavenagh MM, Lively TG et al (2013) Criteria for the use of omics-based predictors in clinical trials. Nature 502:317–320
Article CAS PubMed PubMed Central Google Scholar
Taylor JMG, Ankerst DP, Andridge RR (2008) Validation of biomarker-based risk prediction models. Clin Cancer Res 14:5977–5983
Article PubMed PubMed Central Google Scholar
Athar A, Füllgrabe A, George N et al (2019) ArrayExpress update—from bulk to single-cell expression data. Nucleic Acids Res 47:D711–D715
Article CAS PubMed Google Scholar
Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210
Article CAS PubMed PubMed Central Google Scholar
Schmidt EE, Pelz O, Buhlmann S et al (2013) GenomeRNAi: a database for cell-based and in vivo RNAi phenotypes, 2013 update. Nucleic Acids Res 41:D1021–D1026
Article CAS PubMed Google Scholar
Tryka KA, Hao L, Sturcke A et al (2014) NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res 42:D975–D979
Article CAS PubMed Google Scholar
Ohno-Machado L, Sansone S-A, Alter G et al (2017) Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet 49:816–819
Article CAS PubMed PubMed Central Google Scholar
Perez-Riverol Y, Bai M, da Veiga Leprevost F et al (2017) Discovering and linking public omics data sets using the Omics Discovery Index. Nat Biotechnol 35:406–409
Article CAS PubMed PubMed Central Google Scholar
Sun X, Pittard WS, Xu T et al (2017) Omicseq: a web-based search engine for exploring omics datasets. Nucleic Acids Res 45:W445–W452
Article CAS PubMed PubMed Central Google Scholar
Khomtchouk B, Vand KA, Wahlestedt T et al (2016) PubData: search engine for bioinformatics databases worldwide. BioRxiv
Google Scholar
Quezada H, Guzmán-Ortiz AL, Díaz-Sánchez H et al (2017) Omics-based biomarkers: current status and potential use in the clinic. Bol Med Hosp Infant Mex 74:219–226
PubMed Google Scholar
Olivier M, Asmis R, Hawkins GA et al (2019) The need for multi-omics biomarker signatures in precision medicine. Int J Mol Sci 20:4781
Article CAS PubMed Central Google Scholar
Serra A, Galdi P, Tagliaferri R (2019) Multiview learning in biomedical applications. In: Artificial intelligence in the age of neural networks and brain computing. Elsevier, pp 265–280
Chapter Google Scholar
Fan Z, Zhou Y, Ressom HW (2020) MOTA: network-based multi-omic data integration for biomarker discovery. Metabolites 10(4):144
Article CAS PubMed Central Google Scholar
Nicora G, Vitali F, Dagliati A et al (2020) Integrated multi-omics analyses in oncology: a review of machine learning methods and tools. Front Oncol 10:1030
Article PubMed PubMed Central Google Scholar
Lin E, Lane HY (2017) Machine learning and systems genomics approaches for multi-omics data. Biomark Res 5(1):1–6
Google Scholar
Serra A, Fratello M, Fortino V et al (2015) MVDA: a multi-view genomic data integration methodology. BMC Bioinformatics 16:261
Article PubMed PubMed Central Google Scholar
Pavlidis P, Weston J, Cai J et al (2001) Gene functional classification from heterogeneous data. In: Proceedings of the fifth annual international conference on Computational biology—RECOMB ’01. ACM Press, New York, NY, pp 249–255
Chapter Google Scholar
El-Manzalawy Y, Hsieh T-Y, Shivakumar M et al (2018) Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data. BMC Med Genomics 11:71
Article PubMed PubMed Central Google Scholar
El-Manzalawy Y (2018) CCA based multi-view feature selection for multi-omics data integration. BioRxiv
Google Scholar
Wang, Z, Yuan W, Montana G (2015) Sparse multi-view matrix factorization: a multivariate approach to multiple tissue comparisons. Bioinformatics 31(19):3163–3171
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
Angela Serra, Luca Cattelani, Michele Fratello, Pia Anneli Sofia Kinaret & Dario Greco
BioMediTech Institute, Tampere University, Tampere, Finland
Angela Serra, Luca Cattelani, Michele Fratello, Pia Anneli Sofia Kinaret & Dario Greco
Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), University of Tampere, Tampere, Finland
Angela Serra, Luca Cattelani, Michele Fratello, Pia Anneli Sofia Kinaret & Dario Greco
Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
Vittorio Fortino
Institute of Biotechnology, University of Helsinki, Helsinki, Finland
Pia Anneli Sofia Kinaret & Dario Greco

Authors

Angela Serra
View author publications
You can also search for this author in PubMed Google Scholar
Luca Cattelani
View author publications
You can also search for this author in PubMed Google Scholar
Michele Fratello
View author publications
You can also search for this author in PubMed Google Scholar
Vittorio Fortino
View author publications
You can also search for this author in PubMed Google Scholar
Pia Anneli Sofia Kinaret
View author publications
You can also search for this author in PubMed Google Scholar
Dario Greco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dario Greco .

Editor information

Editors and Affiliations

Dipartimento di Giurisprudenza, Economia e Sociologia, Università degli Studi “Magna Graecia” di Catanzaro, Catanzaro, Italy
Giuseppe Agapito

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Serra, A., Cattelani, L., Fratello, M., Fortino, V., Kinaret, P.A.S., Greco, D. (2022). Supervised Methods for Biomarker Detection from Microarray Experiments. In: Agapito, G. (eds) Microarray Data Analysis. Methods in Molecular Biology, vol 2401. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1839-4_8

Download citation

DOI: https://doi.org/10.1007/978-1-0716-1839-4_8
Published: 14 December 2021
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1838-7
Online ISBN: 978-1-0716-1839-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Supervised Methods for Biomarker Detection from Microarray Experiments

Abstract

Access this chapter

Similar content being viewed by others

Computational Biomarker Discovery

High-Throughput Approaches to Biomarker Discovery and the Challenges of Subsequent Validation

High-Throughput Approaches to Biomarker Discovery and Challenges of Subsequent Validation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Navigation

Supervised Methods for Biomarker Detection from Microarray Experiments

Abstract

Access this chapter

Similar content being viewed by others

Computational Biomarker Discovery

High-Throughput Approaches to Biomarker Discovery and the Challenges of Subsequent Validation

High-Throughput Approaches to Biomarker Discovery and Challenges of Subsequent Validation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation