Skip to main content

Classification of Microarray Data

  • Protocol
  • First Online:
Book cover Microarray Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1986))

Abstract

The automatic classification of DNA microarray data is one of the hot topics in the field of bioinformatics, since it is an effective tool for the diagnosis of diseases in patients. The aim of this chapter is to present the most relevant aspects related to the classification of microarrays. We carried out an analysis of the strategies used for the classification of microarray data and a review of the main methods used in the literature. In addition, other related aspects are addressed as the reduction of dimensionality, to try to eliminate redundant information in genes, or the treatment of imbalanced data and missing of data. To conclude, we present an exhaustive review of the main scientific works in journals to show the most successful techniques applied in this discipline as well as the most used datasets to verify their effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.ebi.ac.uk/arrayexpress/.

  2. 2.

    http://www.ncbi.nlm.nih.gov/geo/.

References

  1. Peng Y (2006) A novel ensemble machine learning for robust microarray data classification. Comput Biol Med 36(6):553–573

    Article  CAS  PubMed  Google Scholar 

  2. Sánchez-Maroño N, Alonso-Betanzos A, García-González P, Bolón-Canedo V (2010) Multiclass classifiers vs multiple binary classifiers using filters for feature selection. In: The 2010 international joint conference on neural networks (IJCNN). IEEE, Piscataway, pp 1–8

    Google Scholar 

  3. Golestani A, Ali Amiri KA, Jahed Motlagh MR (2007) A novel adaptive-boost-based strategy for combining classifiers using diversity concept. In: 6th IEEE/ACIS international conference on computer and information science, 2007, ICIS 2007. IEEE, Piscataway, pp 128–134

    Google Scholar 

  4. Liu Z, Tang D, Cai Y, Wang R, Chen F (2017) A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data. Neurocomputing 266:641–650

    Article  Google Scholar 

  5. Mohapatra P, Chakravarty S, Dash PK (2016) Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system. Swarm Evol Comput 28:144–160

    Article  Google Scholar 

  6. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    Article  Google Scholar 

  7. Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286

    Article  Google Scholar 

  8. Liu K-H, Zeng Z-H, Ng VTY (2016) A hierarchical ensemble of ECOC for cancer classification based on multi-class microarray data. Inf Sci 349:102–118

    Article  Google Scholar 

  9. Lorena AC, De Carvalho ACPLF, Gama JMP (2008) A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30(1–4):19

    Article  Google Scholar 

  10. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  11. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth International Group, Belmont

    Google Scholar 

  12. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  13. Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71

    Google Scholar 

  14. Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective, vol 453. Springer Science & Business Media, New York

    Book  Google Scholar 

  15. Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications, vol 207. Springer, Berlin

    Google Scholar 

  16. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recogn 45(1):531–539

    Article  Google Scholar 

  17. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Recent advances and emerging challenges of feature selection in the context of big data. Knowl-Based Syst 86:33–45

    Article  Google Scholar 

  18. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  19. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  20. Gan X, Liew AW-C, Yan H (2006) Microarray missing data imputation based on a set theoretic framework and biological knowledge. Nucleic Acids Res 34(5):1608–1619

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Xiang Q, Dai X, Deng Y, He C, Wang J, Feng J, Dai Z (2008) Missing value imputation for microarray gene expression data using histone acetylation information. BMC Bioinformatics 9(1):252

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Chiu C-C, Chan S-Y, Wang C-C, Wu W-S (2013) Missing value imputation for microarray data: a comprehensive comparison study and a web tool. BMC Syst Biol 7(6):S12

    Article  PubMed  PubMed Central  Google Scholar 

  23. Liew AW-C, Law N-F, Yan H (2011) Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief Bioinform 12(5):498–513

    Article  PubMed  Google Scholar 

  24. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30

    Google Scholar 

  25. Bramer M (2007) Principles of data mining, vol 180. Springer, London

    Google Scholar 

  26. Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Boca Raton

    Google Scholar 

  27. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135

    Article  Google Scholar 

  28. Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3):374–380

    Article  CAS  PubMed  Google Scholar 

  29. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2014) Data classification using an ensemble of filters. Neurocomputing 135:13–20

    Article  Google Scholar 

  30. Huerta EB, Duval B, Hao J-K (2010) A hybrid LDA and genetic algorithm for gene selection and classification of microarray data. Neurocomputing 73(13):2375–2383

    Article  Google Scholar 

  31. Cadenas JM, Garrido MC, Martínez R (2013) Feature subset selection filter-wrapper based on low quality data. Expert Syst Appl 40(16):6241–6252

    Article  Google Scholar 

  32. Cao J, Zhang L, Wang B, Li F, Yang J (2015) A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inform 53:381–389

    Article  PubMed  Google Scholar 

  33. Czajkowski M, Grześ M, Kretowski M (2014) Multi-test decision tree and its application to microarray data classification. Artif Intell Med 61(1):35–44

    Article  PubMed  Google Scholar 

  34. Deng H, Runger G (2013) Gene selection with guided regularized random forest. Pattern Recogn 46(12):3483–3489

    Article  Google Scholar 

  35. Guo S, Guo D, Chen L, Jiang Q (2016) A centroid-based gene selection method for microarray data classification. J Theor Biol 400:32–41

    Article  CAS  PubMed  Google Scholar 

  36. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215

    Article  Google Scholar 

  37. Karimi S, Farrokhnia M (2014) Leukemia and small round blue-cell tumor cancer detection using microarray gene expression data set: Combining data dimension reduction and variable selection technique. Chemom Intell Lab Syst 139:6–14

    Article  CAS  Google Scholar 

  38. Pramod Kumar P, Vadakkepat P, Poh LA (2011) Fuzzy-rough discriminative feature selection and classification algorithm, with application to microarray and image datasets. Appl Soft Comput 11(4):3429–3440

    Article  Google Scholar 

  39. Lee K, Man Z, Wang D, Cao Z (2013) Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis. Neural Comput Appl 22(3):457–468

    Article  Google Scholar 

  40. Liu H, Liu L, Zhang H (2010) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43(1):81–87

    Article  CAS  PubMed  Google Scholar 

  41. Nanni L, Lumini A (2011) Wavelet selection for disease classification by DNA microarray data. Expert Syst Appl 38(1):990–995

    Article  Google Scholar 

  42. Porto-Díaz I, Bolón-Canedo V, Alonso-Betanzos A, Fontenla-Romero O (2011) A study of performance on microarray data sets for a classifier based on information theoretic learning. Neural Netw 24(8):888–896

    PubMed  Google Scholar 

  43. Reboiro-Jato M, Díaz F, Glez-Peña D, Fdez-Riverola F (2014) A novel ensemble of classifiers that use biological relevant gene sets for microarray classification. Appl Soft Comput 17:117–126

    Article  Google Scholar 

  44. Shah M, Marchand M, Corbeil J (2012) Feature selection with conjunctions of decision stumps and learning from microarray data. IEEE Trans Pattern Anal Mach Intell 34(1):174–186

    Article  CAS  PubMed  Google Scholar 

  45. Sharbaf FV, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6):231–238

    Article  CAS  Google Scholar 

  46. Zainuddin Z, Ong P (2011) Reliable multiclass cancer classification of microarray gene expression profiles using an improved wavelet neural network. Expert Syst Appl 38(11):13711–13722

    Google Scholar 

  47. Dashtban M, Balafar M (2017) Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109(2):91–107

    Article  CAS  PubMed  Google Scholar 

  48. Ganesh Kumar P, Aruldoss Albert Victoire T, Renukadevi P, Devaraj D (2012) Design of fuzzy expert system for microarray data classification using a novel genetic swarm algorithm. Expert Syst Appl 39(2):1811–1821

    Article  Google Scholar 

  49. Leung Y, Hung Y (2010) A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans Comput Biol Bioinform 7(1):108–117

    Article  CAS  PubMed  Google Scholar 

  50. Li HD, Liang YZ, Xu QS, Cao DS, Tan BB, Deng BC, Lin CC (2011) Recipe for uncovering predictive genes using support vector machines based on model population analysis. IEEE/ACM Trans Comput Biol Bioinform 8(6):1633–1641

    Article  PubMed  Google Scholar 

  51. Liu HC, Peng PC, Hsieh TC, Yeh TC, Lin CJ, Chen CY, Hou JY, Shih LY, Liang DC (2013) Comparison of feature selection methods for cross-laboratory microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 10(3):593–604

    Article  PubMed  Google Scholar 

  52. Maji P (2011) Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data. IEEE Trans Syst Man Cybern B Cybern 41(1):222–233

    Article  PubMed  Google Scholar 

  53. Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci 286:228–246

    Article  Google Scholar 

  54. Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) A novel aggregate gene selection method for microarray data classification. Pattern Recogn Lett 60–61:16–23

    Article  Google Scholar 

  55. Orsenigo C, Vercellis C (2012) An effective double-bounded tree-connected Isomap algorithm for microarray data classification. Pattern Recogn Lett 33(1):9–16

    Article  Google Scholar 

  56. Tong M, Liu K-H, Xu C, Ju W (2013) An ensemble of SVM classifiers based on gene pairs. Comput Biol Med 43(6):729–737

    Article  CAS  PubMed  Google Scholar 

  57. Wang X, Park T, Carriere KC (2010) Variable selection via combined penalization for high-dimensional data analysis. Comput Stat Data Anal 54(10):2230–2243

    Article  Google Scholar 

  58. Castaño A, Fernández-Navarro F, Hervás-Martínez C, Gutiérrez PA (2011) Neuro-logistic models based on evolutionary generalized radial basis function for the microarray gene expression classification problem. Neural Process Lett 34(2):117–131

    Article  Google Scholar 

  59. Hernández-Lobato D, Hernández-Lobato JM, Suárez A (2010) Expectation propagation for microarray data classification. Pattern Recogn Lett 31(12):1618–1626

    Article  Google Scholar 

  60. Lee C-P, Lin W-S, Chen Y-M, Kuo B-J (2011) Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Syst Appl 38(5):4661–4667

    Article  Google Scholar 

  61. Li J, Jia Y, Li W (2011) Adaptive huberized support vector machine and its application to microarray classification. Neural Comput Appl 20(1):123–132

    Article  CAS  Google Scholar 

  62. De Paz JF, Bajo J, Vera V, Corchado JM (2011) Microcbr: a case-based reasoning architecture for the classification of microarray data. Appl Soft Comput 11(8):4496–4507

    Article  Google Scholar 

  63. Ocampo-Vega R, Sanchez-Ante G, de Luna MA, Vega R, Falcón-Morales LE, Sossa H (2016) Improving pattern classification of DNA microarray data by using PCA and logistic regression. Intell Data Anal 20(s1):S53–S67

    Article  Google Scholar 

  64. Twala B, Phorah M (2010) Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recogn Lett 31(13):2061–2069

    Article  Google Scholar 

  65. Wang H, Jing X, Niu B (2017) A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Knowl-Based Syst 126:8–19

    Article  Google Scholar 

  66. Zhou P, Hu X, Li P, Wu X (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl-Based Syst 136:187–199

    Article  Google Scholar 

  67. Cheng Q (2010) A sparse learning machine for high-dimensional data with application to microarray gene analysis. IEEE/ACM Trans Comput Biol Bioinform 7(4):636–646

    Article  CAS  PubMed  Google Scholar 

  68. Garro BA, Rodríguez K, Vázquez RA (2016) Classification of DNA microarrays using artificial neural networks and ABC algorithm. Appl Soft Comput 38:548–560

    Article  Google Scholar 

  69. Ghaddar B, Naoum-Sawaya J (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265(3):993–1004

    Article  Google Scholar 

  70. Bielza C, Robles V, Larrañaga P (2011) Regularized logistic regression without a penalty term: an application to cancer classification with microarray data. Expert Syst Appl 38(5):5110–5118

    Article  Google Scholar 

  71. Luque-Baena RM, Urda D, Gonzalo Claros M, Franco L, Jerez JM (2014) Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords. J Biomed Inform 49(C):32–44

    Article  CAS  PubMed  Google Scholar 

  72. Fernández-Navarro F, Hervás-Martínez C, Ruiz R, Riquelme JC (2012) Evolutionary generalized radial basis function neural networks for improving prediction accuracy in gene classification using feature selection. Appl Soft Comput 12(6):1787–1800

    Article  Google Scholar 

  73. Wu MY, Dai DQ, Shi Y, Yan H, Zhang XF (2012) Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage. IEEE/ACM Trans Comput Biol Bioinform 9(6):1649–1662

    Article  PubMed  Google Scholar 

  74. Park CH, Kim SB (2015) Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl 42(5):2336–2342

    Article  Google Scholar 

  75. Alonso-González CJ, Moro-Sancho QI, Simon-Hurtado A, Varela-Arrabal R (2012) Microarray gene expression classification with few genes: criteria to combine attribute selection and classification methods. Expert Syst Appl 39(8):7270–7280

    Article  Google Scholar 

  76. Chakraborty D, Maulik U (2014) Identifying cancer biomarkers from microarray data using feature selection and semisupervised learning. IEEE J Translat Eng Health Med 2:1–11

    Article  Google Scholar 

  77. Debnath R, Kurita T (2010) An evolutionary approach for gene selection and classification of microarray data based on SVM error-bound theories. Biosystems 100(1):39–46

    Article  CAS  PubMed  Google Scholar 

  78. García V, Sánchez JS (2015) Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inf Sci 294:362–375

    Article  Google Scholar 

  79. Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168:1024–1036

    Article  Google Scholar 

  80. Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62

    Article  Google Scholar 

  81. Fan L, Poh K-L, Zhou P (2010) Partition-conditional ICA for Bayesian classification of microarray data. Expert Syst Appl 37(12):8188–8192

    Article  Google Scholar 

  82. Wang A, An N, Chen G, Li L, Alterovitz G (2015) Improving PLS-RFE based gene selection for microarray data classification. Comput Biol Med 62:14–24

    Article  CAS  PubMed  Google Scholar 

  83. Kumar M, Rath SK (2015) Classification of microarray using MapReduce based proximal support vector machine classifier. Knowl-Based Syst 89(C):584–602

    Article  Google Scholar 

  84. Zintzaras E, Kowald A (2010) Forest classification trees and forest support vector machines algorithms: demonstration using microarray data. Comput Biol Med 40(5):519–524

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Noelia Sánchez-Maroño .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Sánchez-Maroño, N., Fontenla-Romero, O., Pérez-Sánchez, B. (2019). Classification of Microarray Data. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9442-7_8

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9441-0

  • Online ISBN: 978-1-4939-9442-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics