Skip to main content

Impact of Experimental Noise and Annotation Imprecision on Data Quality in Microarray Experiments

  • Protocol
  • First Online:
Statistical Methods for Microarray Data Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 972))

Abstract

Data quality is intrinsically influenced by design, technical, and analytical parameters. Quality parameters have not yet been well defined for gene expression analysis by microarrays, though ad interim, following recommended good experimental practice guidelines should ensure generation of reliable and reproducible data. Here we summarize essential practical recommendations for experimental design, technical considerations, feature annotation issues, and standardization efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470

    Article  PubMed  CAS  Google Scholar 

  2. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14:1675–1680

    Article  PubMed  CAS  Google Scholar 

  3. Brown PO, Botstein D (1999) Exploring the new world of the genome with DNA microarrays. Nat Genet 21:33–37

    Article  PubMed  CAS  Google Scholar 

  4. Rogers S, Cambrosio A (2007) Making a new technology work: the standardization and regulation of microarrays. Yale J Biol Med 80:165–178

    PubMed  CAS  Google Scholar 

  5. The Tumor Analysis Best practices Working Group (2004) Expression profiling-best practices for data generation and interpretation in clinical trials. Nat Rev 5:229–237

    Google Scholar 

  6. http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM079855.pdf

  7. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ et al (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24:1151–1161

    Article  PubMed  CAS  Google Scholar 

  8. Clarke JD, Zhu T (2006) Microarray analysis of the transcriptome as a stepping stone towards understanding biological systems: practical considerations and perspectives. Plant J 45:630–650

    Article  PubMed  CAS  Google Scholar 

  9. Dix DJ, Gallagher K, Benson WH, Groskinsky BL, McClintock T, Dearfield KL, Farland WH (2006) A framework for the use of genomics data at the EPA. Nat Biotechnol 24:1108–1111

    Article  PubMed  CAS  Google Scholar 

  10. Grass P (2009) Experimental design, pp 19–31. In: Scherer A (ed) Batch effects and noise in microarray experiments. Wiley, West Sussex, ISBN:978-0-470-74138-2

    Google Scholar 

  11. Sica GT (2006) Bias in research studies. Radiology 238:780–789

    Article  PubMed  Google Scholar 

  12. Rudic RD, McNamara P, Reilly D, Grosser T, Curtis AM, Price TS, Panda S, Hogenesch JB, FitzGerald GA (2005) Bioinformatic analysis of circadian gene oscillation in mouse aorta. Circulation 112:2716–2724

    Article  PubMed  CAS  Google Scholar 

  13. Coombes KR, Highsmith WE, Krogmann TA, Baggerly KA, Stivers DN, Abruzzo LV (2002) Identifying and quantifying sources of variation in microarray data using high-density cDNA membrane arrays. J Comp Biol 9:655–669

    Article  CAS  Google Scholar 

  14. Li X, Gu WMS, Balink D (2002) DNA microarrays: their use and misuse. Microcirculation 9:13–22

    Article  PubMed  CAS  Google Scholar 

  15. Zakharkin SO, Kim K, Mehta T, Chen L, Barnes S, Scheirer KE, Parrish RS, Allison DB, Page GP (2005) Sources of variation in Affymetrix microarray experiments. BMC Bioinform 6:214

    Article  Google Scholar 

  16. Auer H, Lyianarachchi S, Newsom D, Klisovic MI, Marcucci G, Kornacker K (2003) Chipping away at the chip bias: RNA degradation in microarray analysis. Nat Genet 35:292–293

    Article  PubMed  CAS  Google Scholar 

  17. Dumur CI, Nasim S, Best AM, Archer KJ, Ladd AC, Mas VR, Wilkinson DS, Garrett CT, Ferreira-Gonzalez A (2006) Evaluation of quality-control criteria for microarray gene expression analysis. Clin Chem 50:1994–2002

    Article  Google Scholar 

  18. Schroeder A, Mueller O, Stocker S, Salowsky R, Leiber M, Gassmann M, Lightfoot S, Menzel W, Granzow M, Ragg T (2006) The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol Biol 7:3

    Article  PubMed  Google Scholar 

  19. Ioannidis JP (2005) Microarrays and molecular research: noise discovery? Lancet 365:454–455

    PubMed  Google Scholar 

  20. Frantz S (2005) An array of problems. Nat Rev Drug Discov 4:362–363

    Article  PubMed  CAS  Google Scholar 

  21. Strauss E (2006) Arrays of hope. Cell 127:657–659

    Article  PubMed  CAS  Google Scholar 

  22. Ying L, Sarwal M (2008) In praise of arrays. Pediatr Nephrol 24:1643–1659

    Article  PubMed  Google Scholar 

  23. Marshall E (2004) Getting the noise out of gene arrays. Science 306:630–631

    Article  PubMed  CAS  Google Scholar 

  24. Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365:488–492

    Article  PubMed  CAS  Google Scholar 

  25. Ein-Dor L, Zuk O, Domany E (2006) Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA 103:5923–5928

    Article  PubMed  CAS  Google Scholar 

  26. Ioannidis JP, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, Falchi M, Furlanello C, Game L, Jurman G, Mangion J, Mehta T, Nitzberg M, Page GP, Petretto E, van Noort V (2009) Repeatability of published microarray gene expression analyses. Nat Genet 41:149–155

    Article  PubMed  CAS  Google Scholar 

  27. Dobbin KK, Beer DG, Meverson M, Yeatman TJ, Gerald WL, Jacobson JW, Conley B, Buetow KH, Heiskanen M, Simon RM, Minna JD, Girard L, Misek DE, Taylor JM, Hanash S, Naoki K, Hayes DN, Ladd-Acosta C, Enkemann SA, Viale A, Giordano TJ (2005) Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res 11:565–572

    PubMed  CAS  Google Scholar 

  28. Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J (2005) Independence and reproducibility across microarray platforms. Nat Methods 2:337–344

    Article  PubMed  CAS  Google Scholar 

  29. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martínez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W (2005) Multiple-laboratory comparison of micrarray platforms. Nat Methods 2:345–350

    Article  PubMed  CAS  Google Scholar 

  30. Chudin E, Walker R, Kosaka A, Wu SX, Rabert D, Chang TK, Kreder DE (2002) Assessment of the relationship between signal intensities and transcript concentration for Affymetrix GeneChip arrays. Genome Biol 3:RESEARCH0005

    PubMed  Google Scholar 

  31. Moreau Y, Aerts S, De Moor B, De Strooper B, Dabrowski M (2003) Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends Genet 19:570–577

    Article  PubMed  CAS  Google Scholar 

  32. Kim H, Zhao B, Snesrud EC, Haas BJ, Town CD, Quackenbush J (2002) Use of RNA and genomics DNA references for inferred comparisons in DNA microarray analyses. Biotechiques 33:924–930

    CAS  Google Scholar 

  33. Miklos GL, Maleszka R (2004) Microarray reality checks in the context of a complex disease. Nat Biotechnol 22:615–621

    Article  PubMed  CAS  Google Scholar 

  34. The External RNA Controls Consortium (2005) The external RNA controls consortium: a progress report. Nat Methods 2:731–734

    Article  Google Scholar 

  35. Pine PS, Boedigheimer M, Rosenzweig BA, Turpaz Y, He YD, delestarr G, Ganter B, Jarnagin K, Jones WD, Reid LH, Thompson KL (2008) Use of disganostic accuracy as a metric for evaluating laboratory proficiency with microarray assays using mixed-tissue RNA reference samples. Pharmacogenomics 9:1753–1763

    Article  PubMed  CAS  Google Scholar 

  36. Halgren RG, Fielden MR, Fong CJ, Zacharewski TR (2001) Assessment of clone identity and sequence fidelity for 1189 IMAGE cDNA clones. Nucleic Acids Res 29:582–588

    Article  PubMed  CAS  Google Scholar 

  37. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng F (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 33:e175

    Article  PubMed  Google Scholar 

  38. Gautier L, Moller M, Friis-Hansen L, Knudsen S (2004) Alternative mapping of probes to genes for Affymetrix chips. BMC Bioinform 5:111

    Article  Google Scholar 

  39. Harbig J, Sprinkle R, Enkemann SA (2005) A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array. Nucleic Acids Res 33:e31

    Article  PubMed  Google Scholar 

  40. Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P (2005) A haplotype map of the human genome. Nature 437:1299–1320

    Article  Google Scholar 

  41. Lee I, Dombkowski AA, Athey BD (2004) Guidelines for incorporating non-perfectly matched oligonucleotides into target-specific hybridization probes for a DNA microarray. Nucleic Acids Res 32:681–690

    Article  PubMed  CAS  Google Scholar 

  42. Mei R, Hubbell E, Bekiranov S, Mittmann M, Christians FC, Shen MM, Lu G, Fang J, Liu WM, Ryder T, Kaplan P, Kulp D, Webster TA (2003) Probe selection for high-density oligonucleotide arrays. Proc Natl Acad Sci USA 100:11237–11242

    Article  PubMed  CAS  Google Scholar 

  43. Evans SJ, Choudary PV, Neal CR, Li JZ, Vawter MP, Tomita H, Lopez JF, Thompson RC, Meng F, Stead JD, Walsh DM, Myers RM, Bunney WE, Watson SJ, Jones EG, Akil H (2004) Dysregulation of the fibroblast growth factor system in major depression. Proc Natl Acad Sci USA 101:15506–15511

    Article  PubMed  CAS  Google Scholar 

  44. Barbosa-Morais NL, Dunning MJ, Samarajiwa SA, Darot JF, Ritchie ME, Lynch AG, Tavare S (2010) A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic Acids Res 38(3):e17

    Article  PubMed  Google Scholar 

  45. Affymetrix (2005) (a) Exon Array Computational Tool Software User’s Guide, (b) Whole Transcript (WT) Sense Target Labeling Assay Manual, (c) Alternative Transcript Analysis Methods for Exon Arrays v1.1, (d) Exon Array Background Correction v1.0, (e) Exon Probeset Annotations and Transcript Cluster Groupings v1.0, (f) Gene Signal Estimates from Exon Arrays v1.0, (g) Quality Assessment of Exon Arrays v1.0’, http://www.affymetrix.com/products/arrays/specific/exon.affx, Human Exon 1.0 ST Array Manuals and White Papers

  46. Dai M, Wang P, Jakupovic E, Watson SJ, Meng F (2007) Web-based GeneChip analysis system for large-scale collaborative projects. Bioinformatics 23:2185–2187

    Article  PubMed  CAS  Google Scholar 

  47. Lu X, Zhang X (2006) The effect of GeneChip gene definitions on the microarray study of cancers. Bioessays 28:739–746

    Article  PubMed  CAS  Google Scholar 

  48. Sandberg R, Larsson O (2007) Improved precision and accuracy for microarrays using updated probe set definitions. BMC Bioinform 8:48

    Article  Google Scholar 

  49. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951

    Article  PubMed  CAS  Google Scholar 

  50. Lo HS, Wang Z, Hu Y, Yang HH, Gere S, Buetow KH, Lee MP (2003) Allelic variation in gene expression is common in the human genome. Genome Res 13:1855–1862

    Article  PubMed  CAS  Google Scholar 

  51. Kaizer EC, Glaser CL, Chaussabel D, Banchereau J, Pascual V, White PC (2007) Gene expression in peripheral blood mononuclear cells from children with diabetes. J Clin Endocrinol Metab 92:3705–3711

    Article  PubMed  CAS  Google Scholar 

  52. Ideker T, Galitski T, Hood L (2001) A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2:343–372

    Article  PubMed  CAS  Google Scholar 

  53. Quackenbush J (2006) Standardizing the standards. Mol Syst Biol 2:2006.0010

    PubMed  Google Scholar 

  54. Williams-Devane CR, Wolf MA, Richard AM (2009) Toward a public toxicogenomics capability for supporting predictive toxicology: survey of current resources and chemical indexing of experiments in GEO and ArrayExpress. Toxicol Sci 109:358–371

    Article  PubMed  CAS  Google Scholar 

  55. CASIMIR Consortium (2009) Post-publication sharing of data and tools. Nature 461:171–173

    Article  Google Scholar 

  56. Gentleman R, Lang DT (2004) Statistical analyses and reproducible research. www.bepress.com/bioconductor/paper2

  57. Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–32

    Article  PubMed  CAS  Google Scholar 

  58. Gaudet P, Chisholm R, Berardini T, Dimmer E, Engel SR, Fey P, Hill DP, Howe D, Hu JC, Huntley R, Khodiyar VK, Kishore R, Li D, Lovering RC, McCarthy F, Ni L, Petri V, Siegele DA, Tweedie S, Van Auken K, Wood V, Basu S, Carbon S, Dolan M, Mungall CJ, Dolinski K, Thomas P, Ashburner M, Blake JA, Cherry JM, Lewis SE, Balakrishnan R, Christie KR, Costanzo MC, Deegan J, Diehl AD, Drabkin H, Fisk DG, Harris M, Hirschman JE, Hong EL, Ireland A, Lomax J, Nash RS, Park J, Sitnikov D, Skrzypek MS, Apweiler R, Bult C, Eppig J, Jacob H, Parkhill J, Rhee S, Ringwald M, Sternberg P, Talmud P, Twigger S, Westerfield M (2009) The Gene Ontology’s Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol 5:e1000431

    Article  Google Scholar 

  59. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365–371

    Article  PubMed  CAS  Google Scholar 

  60. Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A, White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson A, Bassett D, Stoeckert CJ Jr, Brazma A (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3:RESEARCH0046

    Article  PubMed  Google Scholar 

  61. Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS, Miller M, Petersen K, Quackenbush J, Sherlock G, Stoeckert CJ Jr, White J, Whetzel PL, Wymore F, Parkinson H, Sarkans U, Ball CA, Brazma A (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinform 7:489

    Article  Google Scholar 

  62. Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G, Game L, Heiskanen M, Morrison N, Rocca-Serra P, Sansone SA, Taylor C, White J, Stoeckert CJ Jr (2006) The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 22:866–873

    Article  PubMed  CAS  Google Scholar 

  63. Ball CA, Brazma A, Causton H, Chervitz S, Edgar R, Hingamp P, Matese JC, Parkinson H, Quackenbush J, Ringwald M, Sansone SA, Sherlock G, Spellman P, Stoeckert C, Tateno Y, Taylor R, White J, Winegarden N (2004) Standards for microarray data: an open letter. Environ Health Perspect 112:A666–A667

    Article  PubMed  Google Scholar 

  64. Ball CA, Brazma A, Causton H, Chervitz S, Edgar R, Hingamp P, Matese JC, Parkinson H, Quackenbush J, Ringwald M, Sansone SA, Sherlock G, Spellman P, Stoeckert C, Tateno Y, Taylor R, White J, Winegarden N (2004) Submission of microarray data to public repositories. PLoS Biol 2:E317

    Article  PubMed  Google Scholar 

  65. Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U, Brazma A (2007) ArrayExpress—a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35:D747–D750

    Article  PubMed  CAS  Google Scholar 

  66. Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210

    Article  PubMed  CAS  Google Scholar 

  67. Ikeo K, Ishi-i J, Tamura T, Gojobori T, Tateno Y (2003) CIBEX: center for information biology gene expression database. C R Biol 326:1079–1082

    Article  PubMed  CAS  Google Scholar 

  68. Frueh F (2006) Impact of microarray data quality on genomic data submissions to the FDA. Nat Biotechnol 24:1105–1107

    Article  PubMed  CAS  Google Scholar 

  69. Souza T, Kush R, Evans JP (2007) Global clinical data interchange standards are here! Drug Discov Today 12:174–181

    Article  PubMed  Google Scholar 

  70. U.S. Environmental Protection Agency DRAFT: Interim Guidance for Microarray-Based Assays: Data Submission, Quality, Analysis, Management, and Training Considerations (2007). http://www.epa.gov/osa/spc/pdfs/epa_interim_guidance_for_microarray-based_assays-external-review_draft.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Scherer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this protocol

Cite this protocol

Scherer, A., Dai, M., Meng, F. (2013). Impact of Experimental Noise and Annotation Imprecision on Data Quality in Microarray Experiments. In: Yakovlev, A., Klebanov, L., Gaile, D. (eds) Statistical Methods for Microarray Data Analysis. Methods in Molecular Biology, vol 972. Humana Press, New York, NY. https://doi.org/10.1007/978-1-60327-337-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-337-4_10

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-60327-336-7

  • Online ISBN: 978-1-60327-337-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics