Skip to main content
Log in

Microarray data normalization and transformation

  • Review Article
  • Published:

From Nature Genetics

View current issue Submit your manuscript

Abstract

Underlying every microarray experiment is an experimental question that one would like to address. Finding a useful and satisfactory answer relies on careful experimental design and the use of a variety of data-mining tools to explore the relationships between genes or reveal patterns of expression. While other sections of this issue deal with these lofty issues, this review focuses on the much more mundane but indispensable tasks of 'normalizing' data from individual hybridizations to make meaningful comparisons of expression levels, and of 'transforming' them to select genes for further analysis and data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1: An R-I plot displays the log2(Ri/Gi) ratio for each element on the array as a function of the log10(Ri*Gi) product intensities and can reveal systematic intensity-dependent effects in the measured log2(ratio) values.
Figure 2: Application of local (pen group) lowess can correct for both systematic variation as a function of intensity and spatial variation between spotting pens on a DNA microarray.
Figure 3: The use of replicates can help eliminate questionable or inconsistent data from further analysis.
Figure 4: Local variation as a function of intensity can be used to identify differentially expressed genes by calculating an intensity-dependent Z-score.

Similar content being viewed by others

References

  1. Chatterjee, S. & Price, B. Regression Analysis by Example (John Wiley & Sons, New York, 1991).

    Google Scholar 

  2. Tseng, G.C., Oh, M.K., Rohlin, L., Liao, J.C. & Wong, W.H. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 29, 2549–2557 (2001).

    Article  CAS  Google Scholar 

  3. Chen, Y., Dougherty, E.R. & Bittner, M.L. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Optics 2, 364–374 (1997).

    Article  CAS  Google Scholar 

  4. Yang, Y.H. et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30, e15 (2002).

    Article  Google Scholar 

  5. Yang, I.V. et al. Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol. 3, research0062.1–0062.12 (2002).

    Google Scholar 

  6. Cleveland, W.S. Robust locally weighted regression and smoothing scatterplots. J. Amer. Stat. Assoc. 74, 829–836 (1979).

    Article  Google Scholar 

  7. Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A. & Vingron, M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18, S96–S104 (2002).

    Article  Google Scholar 

  8. Churchill, G.A. Fundamentals of experimental design for cDNA microarrays. Nature Genet. 32, 490–495 (2002).

    Article  CAS  Google Scholar 

  9. Bevington, P.R. & Robinson, D.K. Data Reduction and Error Analysis for the Physical Sciences (McGraw-Hill, New York, 1991).

    Google Scholar 

  10. Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998).

    Article  Google Scholar 

  11. Wen, X., Fuhrman, S., Michaels, G.S., Carr, D.B., Smith, S., Barker, J.L. & Somogy, R. Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl Acad. Sci. USA 95, 334–339 (1998).

    Article  CAS  Google Scholar 

  12. Tamayo, P. et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl Acad. Sci. USA 96, 2907–2912 (1999).

    Article  CAS  Google Scholar 

  13. Li, C. & Wong, W. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA 98, 31–36 (2001).

    Article  CAS  Google Scholar 

  14. Ideker, T., Thorsson, V., Siegel, A.F. & Hood, L.E. Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J. Comput. Biol. 7, 805–817 (2001).

    Article  Google Scholar 

  15. Rocke, D. & Durbin, B. A model for measurement error for gene expression arrays. J. Comput. Biol. 8, 557–569 (2001).

    Article  CAS  Google Scholar 

  16. Stoeckert, C. Microarray databases: standards and ontologies. Nature Genet. 32, 469–473 (2002).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The work presented here evolved from looking at a large body of data and would have been much less useful without the contributions of Norman H. Lee, Renae L. Malek, Priti Hegde, Ivana Yang, Shuibang Wang, Yonghong Wang, Simon Kwong, Heenam Kim, Wei Liang, Vasily Sharov, John Braisted, Alex Saeed, Joseph White, Jerry Li, Renee Gaspard, Erik Snesrud, Yan Yu, Emily Chen, Jeremy Hasseman, Bryan Frank, Lara Linford, Linda Moy, Tara Vantoai, Gary Churchill and Roger Bumgarner. J.Q. is supported by grants from the US National Science Foundation, the National Heart, Lung, and Blood Institute, and the National Cancer Institute. The MIDAS software system used for the normalization and data filtering presented here is freely available as either executable or source code from http://www.tigr.org/software, along with the MADAM data-management system, the Spotfinder image-processing software, and the MeV clustering and data-mining tool.

Author information

Authors and Affiliations

Authors

Ethics declarations

Competing interests

The author declares no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Quackenbush, J. Microarray data normalization and transformation. Nat Genet 32 (Suppl 4), 496–501 (2002). https://doi.org/10.1038/ng1032

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng1032

  • Springer Nature America, Inc.

This article is cited by

Navigation