When using cDNA microarrays, normalization to correct labeling bias is a common preliminary step before further data analysis is applied, its objective being to reduce the variation between arrays. To date, assessment of the effectiveness of normalization has mainly been confined to the ability to detect differentially expressed genes. Since a major use of microarrays is the expression-based phenotype classification, it is important to evaluate microarray normalization procedures relative to classification. Using a model-based approach, we model the systemic-error process to generate synthetic gene-expression values with known ground truth. These synthetic expression values are subjected to typical normalization methods and passed through a set of classification rules, the objective being to carry out a systematic study of the effect of normalization on classification. Three normalization methods are considered: offset, linear regression, and Lowess regression. Seven classification rules are considered: 3-nearest neighbor, linear support vector machine, linear discriminant analysis, regular histogram, Gaussian kernel, perceptron, and multiple perceptron with majority voting. The results of the first three are presented in the paper, with the full results being given on a complementary website. The conclusion from the different experiment models considered in the study is that normalization can have a significant benefit for classification under difficult experimental conditions, with linear and Lowess regression slightly outperforming the offset method.
Quackenbush J: Microarray data normalization and transformation. Nature Genetics 2002,32(5 supplement):496-501.
Bilban M, Buehler LK, Head S, Desoye G, Quaranta V: Normalizing DNA microarray data. Current Issues in Molecular Biology 2002,4(2):57-64.
Attoor S, Dougherty ER, Chen Y, Bittner ML, Trent JM: Which is better for cDNA-microarray-based classification: ratios or direct intensities. Bioinformatics 2004,20(16):2513-2520. 10.1093/bioinformatics/bth272
Chen Y, Kamat V, Dougherty ER, Bittner ML, Meltzer PS, Trent JM: Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics 2002,18(9):1207-1215. 10.1093/bioinformatics/18.9.1207
Yang YH, Dudoit S, Luu P, et al.: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 2002,30(4):e15. 10.1093/nar/30.4.e15
Tseng GC, Oh M-K, Rohlin L, Liao JC, Wong WH: Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Research 2001,29(12):2549-2557. 10.1093/nar/29.12.2549
Devroye L, Gyorfi L, Lugosi G: A Probabilistic Theory of Pattern Recognition. Springer, New York, NY, USA; 1996.
Vapnik VN: Statistical Learning Theory. John Wiley & Sons, New York, NY, USA; 1998.
Rosenblatt F: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington, DC, USA; 1962.
Duda R, Hart P: Pattern Classification. 2nd edition. John Wiley & Sons, New York, NY, USA; 2001.
Chang C-C, Lin C-J: LIBSVM: introduction and benchmarks. Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan; 2000.
Braga-Neto UM, Dougherty ER: Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004,20(3):374-380. 10.1093/bioinformatics/btg419
Pudil P, Novovičová J, Kittler J: Floating search methods in feature selection. Pattern Recognition Letters 1994,15(11):1119-1125. 10.1016/0167-8655(94)90127-9
Jain AK, Zongker D: Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997,19(2):153-158. 10.1109/34.574797
Kudo M, Sklansky J: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 2000,33(1):25-41. 10.1016/S0031-3203(99)00041-2
Braga-Neto U, Dougherty ER: Bolstered error estimation. Pattern Recognition 2004,37(6):1267-1281. 10.1016/j.patcog.2003.08.017
Sima C, Attoor S, Brag-Neto U, Lowey J, Suh E, Dougherty ER: Impact of error estimation on feature selection algorithms. Pattern Recognition 2005,38(12):2472-2482. 10.1016/j.patcog.2005.03.026
Hua J, Xiong Z, Lowey J, Suh E, Dougherty ER: Optimal number of features as a function of sample size for various classification rules. Bioinformatics 2005,21(8):1509-1515. 10.1093/bioinformatics/bti171
Jain AK, Waller WG: On the optimal number of features in the classification of multivariate Gaussian data. Pattern Recognition 1978,10(5-6):365-374. 10.1016/0031-3203(78)90008-0
Chen Y, Dougherty ER, Bittner ML: Ratio-based decisions and the quantitative analysis of cDNA microarray images. Journal of Biomedical Optics 1997,2(4):364-374. 10.1117/12.281504
About this article
Cite this article
Hua, J., Balagurunathan, Y., Chen, Y. et al. Normalization Benefits Microarray-Based Classification. J Bioinform Sys Biology 2006, 43056 (2006). https://doi.org/10.1155/BSB/2006/43056
- Support Vector Machine
- Normalization Method
- Linear Discriminant Analysis
- Majority Vote
- cDNA Microarrays