Impact of Experimental Noise and Annotation Imprecision on Data Quality in Microarray Experiments

  • Andreas SchererEmail author
  • Manhong Dai
  • Fan Meng
Part of the Methods in Molecular Biology book series (MIMB, volume 972)


Data quality is intrinsically influenced by design, technical, and analytical parameters. Quality parameters have not yet been well defined for gene expression analysis by microarrays, though ad interim, following recommended good experimental practice guidelines should ensure generation of reliable and reproducible data. Here we summarize essential practical recommendations for experimental design, technical considerations, feature annotation issues, and standardization efforts.

Key words

Data quality Experimental design Quality parameters Annotation Standardization 


  1. 1.
    Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470PubMedCrossRefGoogle Scholar
  2. 2.
    Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14:1675–1680PubMedCrossRefGoogle Scholar
  3. 3.
    Brown PO, Botstein D (1999) Exploring the new world of the genome with DNA microarrays. Nat Genet 21:33–37PubMedCrossRefGoogle Scholar
  4. 4.
    Rogers S, Cambrosio A (2007) Making a new technology work: the standardization and regulation of microarrays. Yale J Biol Med 80:165–178PubMedGoogle Scholar
  5. 5.
    The Tumor Analysis Best practices Working Group (2004) Expression profiling-best practices for data generation and interpretation in clinical trials. Nat Rev 5:229–237Google Scholar
  6. 6.
  7. 7.
    Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ et al (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24:1151–1161PubMedCrossRefGoogle Scholar
  8. 8.
    Clarke JD, Zhu T (2006) Microarray analysis of the transcriptome as a stepping stone towards understanding biological systems: practical considerations and perspectives. Plant J 45:630–650PubMedCrossRefGoogle Scholar
  9. 9.
    Dix DJ, Gallagher K, Benson WH, Groskinsky BL, McClintock T, Dearfield KL, Farland WH (2006) A framework for the use of genomics data at the EPA. Nat Biotechnol 24:1108–1111PubMedCrossRefGoogle Scholar
  10. 10.
    Grass P (2009) Experimental design, pp 19–31. In: Scherer A (ed) Batch effects and noise in microarray experiments. Wiley, West Sussex, ISBN:978-0-470-74138-2Google Scholar
  11. 11.
    Sica GT (2006) Bias in research studies. Radiology 238:780–789PubMedCrossRefGoogle Scholar
  12. 12.
    Rudic RD, McNamara P, Reilly D, Grosser T, Curtis AM, Price TS, Panda S, Hogenesch JB, FitzGerald GA (2005) Bioinformatic analysis of circadian gene oscillation in mouse aorta. Circulation 112:2716–2724PubMedCrossRefGoogle Scholar
  13. 13.
    Coombes KR, Highsmith WE, Krogmann TA, Baggerly KA, Stivers DN, Abruzzo LV (2002) Identifying and quantifying sources of variation in microarray data using high-density cDNA membrane arrays. J Comp Biol 9:655–669CrossRefGoogle Scholar
  14. 14.
    Li X, Gu WMS, Balink D (2002) DNA microarrays: their use and misuse. Microcirculation 9:13–22PubMedCrossRefGoogle Scholar
  15. 15.
    Zakharkin SO, Kim K, Mehta T, Chen L, Barnes S, Scheirer KE, Parrish RS, Allison DB, Page GP (2005) Sources of variation in Affymetrix microarray experiments. BMC Bioinform 6:214CrossRefGoogle Scholar
  16. 16.
    Auer H, Lyianarachchi S, Newsom D, Klisovic MI, Marcucci G, Kornacker K (2003) Chipping away at the chip bias: RNA degradation in microarray analysis. Nat Genet 35:292–293PubMedCrossRefGoogle Scholar
  17. 17.
    Dumur CI, Nasim S, Best AM, Archer KJ, Ladd AC, Mas VR, Wilkinson DS, Garrett CT, Ferreira-Gonzalez A (2006) Evaluation of quality-control criteria for microarray gene expression analysis. Clin Chem 50:1994–2002CrossRefGoogle Scholar
  18. 18.
    Schroeder A, Mueller O, Stocker S, Salowsky R, Leiber M, Gassmann M, Lightfoot S, Menzel W, Granzow M, Ragg T (2006) The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol Biol 7:3PubMedCrossRefGoogle Scholar
  19. 19.
    Ioannidis JP (2005) Microarrays and molecular research: noise discovery? Lancet 365:454–455PubMedGoogle Scholar
  20. 20.
    Frantz S (2005) An array of problems. Nat Rev Drug Discov 4:362–363PubMedCrossRefGoogle Scholar
  21. 21.
    Strauss E (2006) Arrays of hope. Cell 127:657–659PubMedCrossRefGoogle Scholar
  22. 22.
    Ying L, Sarwal M (2008) In praise of arrays. Pediatr Nephrol 24:1643–1659PubMedCrossRefGoogle Scholar
  23. 23.
    Marshall E (2004) Getting the noise out of gene arrays. Science 306:630–631PubMedCrossRefGoogle Scholar
  24. 24.
    Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365:488–492PubMedCrossRefGoogle Scholar
  25. 25.
    Ein-Dor L, Zuk O, Domany E (2006) Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA 103:5923–5928PubMedCrossRefGoogle Scholar
  26. 26.
    Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, Falchi M, Furlanello C, Game L, Jurman G, Mangion J, Mehta T, Nitzberg M, Page GP, Petretto E, van Noort V (2009) Repeatability of published microarray gene expression analyses. Nat Genet 41:149–155PubMedCrossRefGoogle Scholar
  27. 27.
    Dobbin KK, Beer DG, Meverson M, Yeatman TJ, Gerald WL, Jacobson JW, Conley B, Buetow KH, Heiskanen M, Simon RM, Minna JD, Girard L, Misek DE, Taylor JM, Hanash S, Naoki K, Hayes DN, Ladd-Acosta C, Enkemann SA, Viale A, Giordano TJ (2005) Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res 11:565–572PubMedGoogle Scholar
  28. 28.
    Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J (2005) Independence and reproducibility across microarray platforms. Nat Methods 2:337–344PubMedCrossRefGoogle Scholar
  29. 29.
    Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martínez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W (2005) Multiple-laboratory comparison of micrarray platforms. Nat Methods 2:345–350PubMedCrossRefGoogle Scholar
  30. 30.
    Chudin E, Walker R, Kosaka A, Wu SX, Rabert D, Chang TK, Kreder DE (2002) Assessment of the relationship between signal intensities and transcript concentration for Affymetrix GeneChip arrays. Genome Biol 3:RESEARCH0005PubMedGoogle Scholar
  31. 31.
    Moreau Y, Aerts S, De Moor B, De Strooper B, Dabrowski M (2003) Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends Genet 19:570–577PubMedCrossRefGoogle Scholar
  32. 32.
    Kim H, Zhao B, Snesrud EC, Haas BJ, Town CD, Quackenbush J (2002) Use of RNA and genomics DNA references for inferred comparisons in DNA microarray analyses. Biotechiques 33:924–930Google Scholar
  33. 33.
    Miklos GL, Maleszka R (2004) Microarray reality checks in the context of a complex disease. Nat Biotechnol 22:615–621PubMedCrossRefGoogle Scholar
  34. 34.
    The External RNA Controls Consortium (2005) The external RNA controls consortium: a progress report. Nat Methods 2:731–734CrossRefGoogle Scholar
  35. 35.
    Pine PS, Boedigheimer M, Rosenzweig BA, Turpaz Y, He YD, delestarr G, Ganter B, Jarnagin K, Jones WD, Reid LH, Thompson KL (2008) Use of disganostic accuracy as a metric for evaluating laboratory proficiency with microarray assays using mixed-tissue RNA reference samples. Pharmacogenomics 9:1753–1763PubMedCrossRefGoogle Scholar
  36. 36.
    Halgren RG, Fielden MR, Fong CJ, Zacharewski TR (2001) Assessment of clone identity and sequence fidelity for 1189 IMAGE cDNA clones. Nucleic Acids Res 29:582–588PubMedCrossRefGoogle Scholar
  37. 37.
    Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng F (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 33:e175PubMedCrossRefGoogle Scholar
  38. 38.
    Gautier L, Moller M, Friis-Hansen L, Knudsen S (2004) Alternative mapping of probes to genes for Affymetrix chips. BMC Bioinform 5:111CrossRefGoogle Scholar
  39. 39.
    Harbig J, Sprinkle R, Enkemann SA (2005) A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array. Nucleic Acids Res 33:e31PubMedCrossRefGoogle Scholar
  40. 40.
    Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P (2005) A haplotype map of the human genome. Nature 437:1299–1320CrossRefGoogle Scholar
  41. 41.
    Lee I, Dombkowski AA, Athey BD (2004) Guidelines for incorporating non-perfectly matched oligonucleotides into target-specific hybridization probes for a DNA microarray. Nucleic Acids Res 32:681–690PubMedCrossRefGoogle Scholar
  42. 42.
    Mei R, Hubbell E, Bekiranov S, Mittmann M, Christians FC, Shen MM, Lu G, Fang J, Liu WM, Ryder T, Kaplan P, Kulp D, Webster TA (2003) Probe selection for high-density oligonucleotide arrays. Proc Natl Acad Sci USA 100:11237–11242PubMedCrossRefGoogle Scholar
  43. 43.
    Evans SJ, Choudary PV, Neal CR, Li JZ, Vawter MP, Tomita H, Lopez JF, Thompson RC, Meng F, Stead JD, Walsh DM, Myers RM, Bunney WE, Watson SJ, Jones EG, Akil H (2004) Dysregulation of the fibroblast growth factor system in major depression. Proc Natl Acad Sci USA 101:15506–15511PubMedCrossRefGoogle Scholar
  44. 44.
    Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, Darot JF, Ritchie ME, Lynch AG, Tavare S (2010) A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic Acids Res 38(3):e17PubMedCrossRefGoogle Scholar
  45. 45.
    Affymetrix (2005) (a) Exon Array Computational Tool Software User’s Guide, (b) Whole Transcript (WT) Sense Target Labeling Assay Manual, (c) Alternative Transcript Analysis Methods for Exon Arrays v1.1, (d) Exon Array Background Correction v1.0, (e) Exon Probeset Annotations and Transcript Cluster Groupings v1.0, (f) Gene Signal Estimates from Exon Arrays v1.0, (g) Quality Assessment of Exon Arrays v1.0’,, Human Exon 1.0 ST Array Manuals and White Papers
  46. 46.
    Dai M, Wang P, Jakupovic E, Watson SJ, Meng F (2007) Web-based GeneChip analysis system for large-scale collaborative projects. Bioinformatics 23:2185–2187PubMedCrossRefGoogle Scholar
  47. 47.
    Lu X, Zhang X (2006) The effect of GeneChip gene definitions on the microarray study of cancers. Bioessays 28:739–746PubMedCrossRefGoogle Scholar
  48. 48.
    Sandberg R, Larsson O (2007) Improved precision and accuracy for microarrays using updated probe set definitions. BMC Bioinform 8:48CrossRefGoogle Scholar
  49. 49.
    Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951PubMedCrossRefGoogle Scholar
  50. 50.
    Lo HS, Wang Z, Hu Y, Yang HH, Gere S, Buetow KH, Lee MP (2003) Allelic variation in gene expression is common in the human genome. Genome Res 13:1855–1862PubMedCrossRefGoogle Scholar
  51. 51.
    Kaizer EC, Glaser CL, Chaussabel D, Banchereau J, Pascual V, White PC (2007) Gene expression in peripheral blood mononuclear cells from children with diabetes. J Clin Endocrinol Metab 92:3705–3711PubMedCrossRefGoogle Scholar
  52. 52.
    Ideker T, Galitski T, Hood L (2001) A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2:343–372PubMedCrossRefGoogle Scholar
  53. 53.
    Quackenbush J (2006) Standardizing the standards. Mol Syst Biol 2:2006.0010PubMedGoogle Scholar
  54. 54.
    Williams-Devane CR, Wolf MA, Richard AM (2009) Toward a public toxicogenomics capability for supporting predictive toxicology: survey of current resources and chemical indexing of experiments in GEO and ArrayExpress. Toxicol Sci 109:358–371PubMedCrossRefGoogle Scholar
  55. 55.
    CASIMIR Consortium (2009) Post-publication sharing of data and tools. Nature 461:171–173CrossRefGoogle Scholar
  56. 56.
    Gentleman R, Lang DT (2004) Statistical analyses and reproducible research.
  57. 57.
    Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–32PubMedCrossRefGoogle Scholar
  58. 58.
    Gaudet P, Chisholm R, Berardini T, Dimmer E, Engel SR, Fey P, Hill DP, Howe D, Hu JC, Huntley R, Khodiyar VK, Kishore R, Li D, Lovering RC, McCarthy F, Ni L, Petri V, Siegele DA, Tweedie S, Van Auken K, Wood V, Basu S, Carbon S, Dolan M, Mungall CJ, Dolinski K, Thomas P, Ashburner M, Blake JA, Cherry JM, Lewis SE, Balakrishnan R, Christie KR, Costanzo MC, Deegan J, Diehl AD, Drabkin H, Fisk DG, Harris M, Hirschman JE, Hong EL, Ireland A, Lomax J, Nash RS, Park J, Sitnikov D, Skrzypek MS, Apweiler R, Bult C, Eppig J, Jacob H, Parkhill J, Rhee S, Ringwald M, Sternberg P, Talmud P, Twigger S, Westerfield M (2009) The Gene Ontology’s Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol 5:e1000431CrossRefGoogle Scholar
  59. 59.
    Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365–371PubMedCrossRefGoogle Scholar
  60. 60.
    Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ Jr, Brazma A (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3:RESEARCH0046PubMedCrossRefGoogle Scholar
  61. 61.
    Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS, Miller M, Petersen K, Quackenbush J, Sherlock G, Stoeckert CJ Jr, White J, Whetzel PL, Wymore F, Parkinson H, Sarkans U, Ball CA, Brazma A (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinform 7:489CrossRefGoogle Scholar
  62. 62.
    Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G, Game L, Heiskanen M, Morrison N, Rocca-Serra P, Sansone SA, Taylor C, White J, Stoeckert CJ Jr (2006) The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 22:866–873PubMedCrossRefGoogle Scholar
  63. 63.
    Ball CA, Brazma A, Causton H, Chervitz S, Edgar R, Hingamp P, Matese JC, Parkinson H, Quackenbush J, Ringwald M, Sansone SA, Sherlock G, Spellman P, Stoeckert C, Tateno Y, Taylor R, White J, Winegarden N (2004) Standards for microarray data: an open letter. Environ Health Perspect 112:A666–A667PubMedCrossRefGoogle Scholar
  64. 64.
    Ball CA, Brazma A, Causton H, Chervitz S, Edgar R, Hingamp P, Matese JC, Parkinson H, Quackenbush J, Ringwald M, Sansone SA, Sherlock G, Spellman P, Stoeckert C, Tateno Y, Taylor R, White J, Winegarden N (2004) Submission of microarray data to public repositories. PLoS Biol 2:E317PubMedCrossRefGoogle Scholar
  65. 65.
    Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A (2007) ArrayExpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35:D747–D750PubMedCrossRefGoogle Scholar
  66. 66.
    Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210PubMedCrossRefGoogle Scholar
  67. 67.
    Ikeo K, Ishi-i J, Tamura T, Gojobori T, Tateno Y (2003) CIBEX: center for information biology gene expression database. C R Biol 326:1079–1082PubMedCrossRefGoogle Scholar
  68. 68.
    Frueh F (2006) Impact of microarray data quality on genomic data submissions to the FDA. Nat Biotechnol 24:1105–1107PubMedCrossRefGoogle Scholar
  69. 69.
    Souza T, Kush R, Evans JP (2007) Global clinical data interchange standards are here! Drug Discov Today 12:174–181PubMedCrossRefGoogle Scholar
  70. 70.
    U.S. Environmental Protection Agency DRAFT: Interim Guidance for Microarray-Based Assays: Data Submission, Quality, Analysis, Management, and Training Considerations (2007).

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Genomics, Biomarker Development, Spheromics, KontiolahtiJoensuuFinland
  2. 2.Psychiatry Department and Molecular and Behavioral Neuroscience InstituteUniversity of Michigan, University of MichiganAnn ArborUSA
  3. 3.Psychiatry Department and Molecular and Behavioral Neuroscience InstituteUniversity of Michigan, University of MichiganAnn ArbourUSA

Personalised recommendations