Normalization Benefits Microarray-Based Classification

Abstract

When using cDNA microarrays, normalization to correct labeling bias is a common preliminary step before further data analysis is applied, its objective being to reduce the variation between arrays. To date, assessment of the effectiveness of normalization has mainly been confined to the ability to detect differentially expressed genes. Since a major use of microarrays is the expression-based phenotype classification, it is important to evaluate microarray normalization procedures relative to classification. Using a model-based approach, we model the systemic-error process to generate synthetic gene-expression values with known ground truth. These synthetic expression values are subjected to typical normalization methods and passed through a set of classification rules, the objective being to carry out a systematic study of the effect of normalization on classification. Three normalization methods are considered: offset, linear regression, and Lowess regression. Seven classification rules are considered: 3-nearest neighbor, linear support vector machine, linear discriminant analysis, regular histogram, Gaussian kernel, perceptron, and multiple perceptron with majority voting. The results of the first three are presented in the paper, with the full results being given on a complementary website. The conclusion from the different experiment models considered in the study is that normalization can have a significant benefit for classification under difficult experimental conditions, with linear and Lowess regression slightly outperforming the offset method.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]

References

  1. 1.

    Quackenbush J: Microarray data normalization and transformation. Nature Genetics 2002,32(5 supplement):496-501.

    Article  Google Scholar 

  2. 2.

    Bilban M, Buehler LK, Head S, Desoye G, Quaranta V: Normalizing DNA microarray data. Current Issues in Molecular Biology 2002,4(2):57-64.

    Google Scholar 

  3. 3.

    Attoor S, Dougherty ER, Chen Y, Bittner ML, Trent JM: Which is better for cDNA-microarray-based classification: ratios or direct intensities. Bioinformatics 2004,20(16):2513-2520. 10.1093/bioinformatics/bth272

    Article  Google Scholar 

  4. 4.

    Chen Y, Kamat V, Dougherty ER, Bittner ML, Meltzer PS, Trent JM: Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics 2002,18(9):1207-1215. 10.1093/bioinformatics/18.9.1207

    Article  Google Scholar 

  5. 5.

    Yang YH, Dudoit S, Luu P, et al.: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 2002,30(4):e15. 10.1093/nar/30.4.e15

    Article  Google Scholar 

  6. 6.

    Tseng GC, Oh M-K, Rohlin L, Liao JC, Wong WH: Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Research 2001,29(12):2549-2557. 10.1093/nar/29.12.2549

    Article  Google Scholar 

  7. 7.

    Devroye L, Gyorfi L, Lugosi G: A Probabilistic Theory of Pattern Recognition. Springer, New York, NY, USA; 1996.

    Google Scholar 

  8. 8.

    Vapnik VN: Statistical Learning Theory. John Wiley & Sons, New York, NY, USA; 1998.

    Google Scholar 

  9. 9.

    Rosenblatt F: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington, DC, USA; 1962.

    Google Scholar 

  10. 10.

    Duda R, Hart P: Pattern Classification. 2nd edition. John Wiley & Sons, New York, NY, USA; 2001.

    Google Scholar 

  11. 11.

    Chang C-C, Lin C-J: LIBSVM: introduction and benchmarks. Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan; 2000.

    Google Scholar 

  12. 12.

    Braga-Neto UM, Dougherty ER: Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004,20(3):374-380. 10.1093/bioinformatics/btg419

    Article  Google Scholar 

  13. 13.

    Pudil P, Novovičová J, Kittler J: Floating search methods in feature selection. Pattern Recognition Letters 1994,15(11):1119-1125. 10.1016/0167-8655(94)90127-9

    Article  Google Scholar 

  14. 14.

    Jain AK, Zongker D: Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997,19(2):153-158. 10.1109/34.574797

    Article  Google Scholar 

  15. 15.

    Kudo M, Sklansky J: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 2000,33(1):25-41. 10.1016/S0031-3203(99)00041-2

    Article  Google Scholar 

  16. 16.

    Braga-Neto U, Dougherty ER: Bolstered error estimation. Pattern Recognition 2004,37(6):1267-1281. 10.1016/j.patcog.2003.08.017

    MATH  Article  Google Scholar 

  17. 17.

    Sima C, Attoor S, Brag-Neto U, Lowey J, Suh E, Dougherty ER: Impact of error estimation on feature selection algorithms. Pattern Recognition 2005,38(12):2472-2482. 10.1016/j.patcog.2005.03.026

    Article  Google Scholar 

  18. 18.

    Hua J, Xiong Z, Lowey J, Suh E, Dougherty ER: Optimal number of features as a function of sample size for various classification rules. Bioinformatics 2005,21(8):1509-1515. 10.1093/bioinformatics/bti171

    Article  Google Scholar 

  19. 19.

    Jain AK, Waller WG: On the optimal number of features in the classification of multivariate Gaussian data. Pattern Recognition 1978,10(5-6):365-374. 10.1016/0031-3203(78)90008-0

    MATH  Article  Google Scholar 

  20. 20.

    Chen Y, Dougherty ER, Bittner ML: Ratio-based decisions and the quantitative analysis of cDNA microarray images. Journal of Biomedical Optics 1997,2(4):364-374. 10.1117/12.281504

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jianping Hua.

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Hua, J., Balagurunathan, Y., Chen, Y. et al. Normalization Benefits Microarray-Based Classification. J Bioinform Sys Biology 2006, 43056 (2006). https://doi.org/10.1155/BSB/2006/43056

Download citation

Keywords

  • Support Vector Machine
  • Normalization Method
  • Linear Discriminant Analysis
  • Majority Vote
  • cDNA Microarrays