Abstract
The advent of DNA microarray datasets has stimulated a new line of research both in bioinformatics and in machine learning. This type of data is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for disease diagnosis or for distinguishing specific types of tumor. Microarray data classification is a difficult challenge for machine learning researchers due to its high number of features and the small sample sizes. This chapter is devoted to reviewing the microarray databases most frequently used in the literature. We also make the interested reader aware of the problematic of data characteristics in this domain, such as the imbalance of the data, their complexity, and the so-called dataset shift.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A type I error is the incorrect rejection of a true null hypothesis that usually has the effect of concluding that a given relationship exists when in fact it does not. That is, a type I error is a false positive.
References
Piatetsky-Shapiro G, Tamayo P (2003) Microarray data mining: facing the challenges. ACM SIGKDD Explor Newsl 5(2):1–5
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications, vol 207. Springer, Berlin
Arrayexpress - Functional Genomics Data (2018). http://www.ebi.ac.uk/arrayexpress/. [Online; accessed Jan 2018]
Gene Expression Omnibus (2018). http://www.ncbi.nlm.nih.gov/geo/. [Online; accessed Jan 2018]
The Cancer Genome Atlas (TCGA) (2018). https://cancergenome.nih.gov/. [Online; accessed Jan 2018]
Broad Institute (2018) Cancer Program Data Sets. http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi. [Online; accessed Jan 2018]
Dataset Repository, Bioinformatics Research Group (2018). http://www.upo.es/eps/bigs/datasets.html. [Online; accessed Jan 2018]
Statnikov A, Aliferis CF, Tsamardinos I (2018) Gems: gene expression model selector. http://www.gems-system.org. [Online; accessed Jan 2018]
Gene Expression Project (2014) Princeton University. http://genomics-pubs.princeton.edu/oncology/. [Online; accessed Jan 2014]
The Arabidopsis Information Resource, Gene Expression Resources (2018) https://www.arabidopsis.org/portals/expression/microarray/. [Online; accessed Jan 2018]
Hruz T, Laule O, Szabo G, Wessendorp F, Bleuler S, Oertle L, Widmayer P, Gruissem W, Zimmermann P (2008) Genevestigator v3: a reference expression database for the meta-analysis of transcriptomes. Adv Bioinforma 2008, 5pp.
An open-source r framework for your microarray analysis (2018). http://www.aroma-project.org/. [Online; accessed Jan 2018]
ELVIRA Biomedical Data Set Repository (2018). http://leo.ugr.es/elvira/DBCRepository/. [Online; accessed Jan 2018]
Machine Learning Dataset Repository (2018). http://mldata.org/repository/data/. [Online; accessed Jan 2018]
The home of data science & machine learning (2018). https://www.kaggle.com/datasets. [Online; accessed Jan 2018]
Frank A, Asuncion A (2018). UCI machine learning repository. http://archive.ics.uci.edu/ml, 2010. [Online; accessed Jan 2018]
Feature Selection Datasets at Arizona State University (2018). http://featureselection.asu.edu/datasets.php. [Online; accessed Jan 2018]
Bioconductor, open source software for bioinformatics (2018). http://www.bioconductor.org. [Online; accessed Jan 2018]
Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JYH, Goumnerova LC, Black PM, Lau C et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442
Shah M, Marchand M, Corbeil J (2012) Feature selection with conjunctions of decision stumps and learning from microarray data. IEEE Trans Pattern Anal Mach Intell 34(1):174–186
Tian E, Zhan F, Walker R, Rasmussen E, Ma Y, Barlogie B, Shaughnessy JD Jr (2003) The role of the wnt-signaling antagonist dkk1 in the development of osteolytic lesions in multiple myeloma. N Engl J Med 349(26):2483–2494
Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, Ladd C, Pohl U, Hartmann C, McLaughlin ME, Batchelor TT et al (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63(7):1602–1607
Bolón-Canedo V, Seth S, Sánchez-Maroño N, Alonso-Betanzos A, Principe JC (2011) Statistical dependence measure for feature selection in microarray datasets. In: 19th European symposium on artificial neural networks-ESANN, pp 23–28
Bolón-Canedo V, Sánchez-Marono N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135
Bolón-Canedo V, Sechidis K, Sánchez-Marono N, Alonso-Betanzos A, Brown G (2017) Exploring the consequences of distributed feature selection in dna microarray data. In: International joint conference on neural networks
Ebrahimpour MK, Zare M, Eftekhari M, Aghamolaei G (2017) Occam’s razor in dimension reduction: using reduced row echelon form for finding linear independent features in high dimensional microarray datasets. Eng Appl Artif Intell 62:214–221
Wanderley MF, Gardeux V, Natowicz R, Braga AP (2013) Ga-kde-bayes: an evolutionary wrapper method based on non-parametric density estimation applied to bioinformatics problems. In: 21st European symposium on artificial neural networks-ESANN, pp 155–160
Meyer PE, Schretter C, Bontempi G (2008) Information-theoretic feature selection in microarray data using variable complementarity. IEEE J Sel Top Signal Process 2(3):261–274
Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Raffeld M et al (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548
Lee C, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11(1):208–213
van’t Veer LJ, Dai H, Van De Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recogn 45(1):531–539
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2010) On the effectiveness of discretization on gene selection of microarray data. In: The 2010 international joint conference on neural networks (IJCNN). IEEE, Piscataway, pp 18–23
Kumar M, Rath SK (2015) Classification of microarray using mapreduce based proximal support vector machine classifier. Knowl-Based Syst 89:584–602
Mohapatra P, Chakravarty S, Dash PK (2016) Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system. Swarm Evol Comput 28:144–160
Navarro FFG, Muñoz LAB (2009) Gene subset selection in microarray data using entropic filtering for cancer classification. Expert Syst 26(1):113–124
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci 98(20):11462–11467
Leung Y, Hung Y (2010) A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans Comput Biol Bioinform 7(1):108–117
Heap G, Trynka G, Jansen R, Bruinenberg M, Swertz M, Dinesen L, Hunt K, Wijmenga C et al (2009) Complex nature of snp genotype effects on gene expression in primary human leucocytes. BMC Med Genomics 2(1):1
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2014) Data classification using an ensemble of filters. Neurocomputing 135:13–20
Dessì N, Pes B (2015) Similarity of feature selection methods: an empirical study across data intensive classification tasks. Expert Syst Appl 42(10):4632–4642
Shreem SS, Abdullah S, Nazri MZA, Alzaqebah M (2012) Hybridizing ReliefF, MRMR filters and GA wrapper approaches for gene selection. J Theor Appl Inf Technol 46(2):1034–1039
Yang F, Mao KZ (2011) Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans Comput Biol Bioinform 8(4):1080–1092
Ye Y, Wu Q, Huang JZ, Ng MK, Li X (2013) Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recogn 46(3):769–787
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750
Ferreira AJ, Figueiredo MAT (2012) An unsupervised approach to feature discretization and selection. Pattern Recogn 45(9):3048–3060
Lovato P, Bicego M, Cristani M, Jojic N, Perina A (2012) Feature selection using counting grids: application to microarray data. In: Structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 629–637
Song L, Smola A, Gretton A, Bedo J, Borgwardt K (2012) Feature selection via dependence maximization. J Mach Learn Res 98888:1393–1434
Maldonado S, Weber R, Basak J (2011) Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf Sci 181(1):115–128
Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y (2010) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3):392–398
Mundra PA, Rajapakse JC (2010) SVM-RFE with mRMR filter for gene selection. IEEE Trans NanoBiosci 9(1):31–37
Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) Hidden Markov models for cancer classification using gene expression profiles. Inf Sci 316:293–307
Wang J, Wu L, Kong J, Li Y, Zhang B (2013) Maximum weight and minimum redundancy: a novel framework for feature subset selection. Pattern Recogn 46(6):1616–1627
Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14
Canul-Reich J, Hall LO, Goldgof DB, Korecki JN, Eschrich S (2012) Iterative feature perturbation as a gene selector for microarray data. Int J Pattern Recogn Artif Intell 26(05):1260003
Moradkhani M, Amiri A, Javaherian M, Safari H (2015) A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm. Appl Soft Comput 35:123–135
Noble CL, Abbas AR, Cornelius J, Lees CW, Ho G, Toy K, Modrusan Z, Pal N, Zhong F, Chalasani S et al (2008) Regional variation in gene expression in the healthy colon is dysregulated in ulcerative colitis. Gut 57(10):1398–1405
Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, Gaasenbeek M, Angelo M, Reich M, Pinkus GS et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74
Chuang L, Yang C, Wu K, Yang C (2011) A hybrid feature selection method for dna microarray data. Comput Biol Med 41(4):228–237
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511
Freije WA, Castro-Vargas FE, Fang Z, Horvath S, Cloughesy T, Liau LM, Mischel PS, Nelson SF (2004) Gene expression profiling of gliomas strongly predicts survival. Cancer Res 64(18):6503–6510
Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint l2, 1-norms minimization. Adv Neural Inf Process Syst 23:1813–1821
Guangtao W, Qinbao S, Baowen X, Yuming Z (2013) Selecting feature subset for high dimensional data via the propositional foil rules. Pattern Recogn 46(1):199–214
Kang S, Song J (2017) Robust gene selection methods using weighting schemes for microarray data analysis. BMC Bioinformatics 18(1):389
Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, Van De Rijn M, Rosen GD, Perou CM, Whyte RI et al (2001) Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci 98(24):13784–13789
Gordon GJ, Jensen RV, Hsiao L, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967
Zhou P, Hu X, Li P, Wu X (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl-Based Syst 136:187–199
Shedden K, Taylor JMG, Enkemann SA, Tsao M, Yeatman TJ, Gerald WL, Eschrich S, Jurisica I, Giordano TJ, Misek DE et al (2008) Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 14(8):822–827
Eschrich S, Yang I, Bloom G, Kwong KY, Boulware D, Cantor A, Coppola D, Kruhøffer M, Aaltonen L, Orntoft TF et al (2005) Molecular staging for survival prediction of colorectal cancer patients. J Clin Oncol 23(15):3526–3535
Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306):572–577
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Sharma A, Imoto S, Miyano S (2012) A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans Comput Biol Bioinform 9(3):754–764
Spira A, Beane JE, Shah V, Steiling K, Liu G, Schembri F, Gilman S, Dumas Y, Calner P, Sebastiani P et al (2007) Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat Med 13(3):361–366
Staunton JE, Slonim DK, Coller HA, Tamayo P, Angelo MJ, Park J, Scherf U, Lee JK, Reinhold WO, Weinstein JN et al (2001) Chemosensitivity prediction by transcriptional profiling. Proc Natl Acad Sci 98(19):10787–10792
Liu Z, Tang D, Cai Y, Wang R, Chen F (2017) A hybrid method based on ensemble welm for handling multi class imbalance in cancer microarray data. Neurocomputing 266:641–650
Su AI, Welsh JB, Sapinoso LM, Kern SG, Dimitrov P, Lapp H, Schultz PG, Powell SM, Moskaluk CA, Frierson HF Jr et al (2001) Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res 61(20):7388–7393
Liu K-H, Zeng Z-H, Ng VTY (2016) A hierarchical ensemble of ECOC for cancer classification based on multi-class microarray data. Inf Sci 349:102–118
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP et al (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci 98(26):15149–15154
Lan L, Vucetic S (2011) Improving accuracy of microarray classification by a simple multi-task feature selection filter. Int J Data Min Bioinform 5(2):189–208
Morán-Fernández L, Bolón-Canedo V, Alonso-Betanzos A (2017) On the use of different base classifiers in multiclass problems. Prog Artif Intell 1–9. https://doi.org/10.1007/s13748-017-0126-4
Haslinger C, Schweifer N, Stilgenbauer S, Döhner H, Lichter P, Kraut N, Stratowa C, Abseher R (2004) Microarray gene expression profiling of B-cell chronic lymphocytic leukemia subgroups defined by genomic aberrations and VH mutation status. J Clin Oncol 22(19):3937–3949
Sun L, Hui A, Su Q, Vortmeyer A, Kotliarov Y, Pastorino S, Passaniti A, Menon J, Walling J, Bailey R et al (2006) Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell 9(4):287–300
Anaissi A, Kennedy PJ, Goyal M (2011) Feature selection of imbalanced gene expression microarray data. In: 2011 12th ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD). IEEE, Piscataway, pp 73–78
Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ et al (2002) Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30(1):41–47
Student S, Fujarewicz K (2012) Stable feature selection and classification algorithms for multiclass microarray data. Biol Direct 7(1):33
Liu K-H, Tong M, Xie S-T, Ng VTY (2015) Genetic programming based ensemble system for microarray data classification. Comput Math Methods Med 2015, 11pp.
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M et al (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci 98(24):13790–13795
Stienstra R, Saudale F, Duval C, Keshtkar S, Groener JEM, van Rooijen N, Staels B, Kersten S, Müller M (2010) Kupffer cells promote hepatic steatosis via interleukin-1beta-dependent suppression of peroxisome proliferator-activated receptor alpha activity. Hepatology 51(2):511–522
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679
Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci 97(1):262–267
Dougherty ER (2001) Small sample issues for microarray-based classification. Comp Funct Genomics 2(1):28–34
Yang H, Churchill G (2007) Estimating p-values in small microarray experiments. Bioinformatics 23(1):38–43
Storey JD, Tibshirani R, Garret ES, Irizarry RA, Zeger SL (2003) SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. Springer, New York
Xie Y, Pan W, Khodursky AB (2005) A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data. Bioinformatics 21(23):4280–4288
Murie C, Woody O, Lee AY (2009) Comparison of small n statistical tests of differential expression applied to microarrays. BMC Bioinformatics 10:45
Paul J, Chiu D, Golovan S, Husain M, Hakimov H (2008) Analysis of extremely small sample microarrays using multi-source data 1
Nikulin V (2014) On a solution for the high-dimensionality-small-sample-size regression problem with several different microarrays. Int J Data Min Bioinform 9(3):221–234
Allison DB, Gadbury GL, Heo M, Fernández JR, Lee C-K, Prolla TA, Weindruch R (2002) A mixture model approach for the analysis of microarray gene expression data. Comput Stat Data Anal 39(1):1–20
Phan JH, Moffitt RA, Barrett AB, Wang MD (2008) Improving microarray sample size using bootstrap data combination. In: Proceedings conf. IEEE engineering in medicine and biology society. IEEE, Piscataway, pp 5660–5663
Braga-Neto U (2007) Fads and fallacies in the name of small-sample microarray classification-a highlight of misunderstanding and erroneous usage in the applications of genomic signal processing. IEEE Signal Process Mag 24(1):91–99
Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365(9458):488–492
Braga-Neto UM, Dogherty ER (2004) Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3):374–380
Hanczar B, Jianping H, Sima C, Weinstein J, Bittner M, Dougherty ER (2010) Small-sample precision of ROC-related estimates. Bioinformatics 26(6):822–830
Laber EB, Murphy SA (2008) Small sample inference for generalization error in classification using the cud bound. In: Proc. of the conference on uncertainty in artificial intelligence, pp 357–365
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recogn Artif Intell 23(04):687–719
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250(0):113–141
Lusa L et al (2010) Class prediction for high-dimensional class-imbalanced data. BMC Bioinformatics 11(1):523
Galar M, Fernández A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Blagus R, Lusa L (2012) Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 2012 11th international conference on machine learning and applications (ICMLA), vol 2. IEEE, Piscataway, pp 89–94
Morán-Fernández L, Bolón-Canedo V, Alonso-Betanzos A (2016) Data complexity measures for analyzing the effect of smote over microarrays. In: European symposium on artificial neural networks, computational intelligence and machine learning
Galar M, Fernández A, Barrenechea E, Herrera F (2013) Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn 46(12):3460–3471
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci 286:228–246
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300
Lorena AC, Costa IG, Spolaôr N, de Souto MCP (2012) Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75(1):33–42
Okun O, Priisalu H (2009) Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors. Artif Intell Med 45(2):151–162
Bolón-Canedo V, Moran-Fernandez L, Alonso-Betanzos A (2015) An insight on complexity measures and classification in microarray data. In: 2015 International joint conference on neural networks (IJCNN). IEEE, Piscataway, pp 42–49
Morán-Fernández L, Bolón-Canedo V, Alonso-Betanzos A (2017) Can classification performance be predicted by complexity measures? A study using microarray data. Knowl Inf Syst 51(3):1067–1090
Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, Chawla NV, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recogn 45(1):521–530
Moreno-Torres JG, Sáez JA, Herrera F (2012) Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans Neural Netw Learn Syst 23(8):1304–1312
Barnett V, Lewis T (1994) Outliers in statistical data, vol 3. Wiley, New York
Kadota K, Tominaga D, Akiyama Y, Takahashi K (2003) Detecting outlying samples in microarray data: a critical assessment of the effect of outliers on sample classification. Chem-Bio Inf 3(1):30–45
Gonzalez-Navarro FF (2011) Feature selection in cancer research: microarray gene expression and in vivo 1H-MRS domains. PhD thesis, Technical University of Catalonia
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Alonso-Betanzos, A., Bolón-Canedo, V., Morán-Fernández, L., Sánchez-Maroño, N. (2019). A Review of Microarray Datasets: Where to Find Them and Specific Characteristics. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_4
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9442-7_4
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9441-0
Online ISBN: 978-1-4939-9442-7
eBook Packages: Springer Protocols