Skip to main content

A Guide to Illumina BeadChip Data Analysis

  • Protocol
  • First Online:
DNA Methylation Protocols

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1708))

Abstract

The Illumina Infinium BeadChips are a powerful array-based platform for genome-wide DNA methylation profiling at approximately 485,000 (450K) and 850,000 (EPIC) CpG sites across the genome. The platform is used in many large-scale population-based epigenetic studies of complex diseases, environmental exposures, or other experimental conditions. This chapter provides an overview of the key steps in analyzing Illumina BeadChip data. We describe key preprocessing steps including data extraction and quality control as well as normalization strategies. We further present principles and guidelines for conducting association analysis at the individual CpG level as well as more sophisticated pathway-based association tests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Razin A (1998) CpG methylation, chromatin structure and gene silencing-a three-way connection. EMBO J 17:4905–4908

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Razin A, Riggs AD (1980) DNA methylation and gene function. Science 210:604–610

    Article  CAS  PubMed  Google Scholar 

  3. Ramsahoye BH, Biniszkiewicz D, Lyko F et al (2000) Non-CpG methylation is prevalent in embryonic stem cells and may be mediated by DNA methyltransferase 3a. Proc Natl Acad Sci U S A 97:5237–5242

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Bibikova M, Barnes B, Tsan C et al (2011) High density DNA methylation array with single CpG site resolution. Genomics 98:288–295

    Article  CAS  PubMed  Google Scholar 

  5. Sandoval J, Heyn H, Moran S et al (2011) Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 6:692–702

    Article  CAS  PubMed  Google Scholar 

  6. Joubert BR, Haberg SE, Bell DA et al (2014) Maternal smoking and DNA methylation in newborns: in utero effect or epigenetic inheritance? Cancer Epidemiol Biomark Prev 23:1007–1017

    Article  CAS  Google Scholar 

  7. Joubert BR, Haberg SE, Nilsen RM et al (2012) 450K epigenome-wide scan identifies differential DNA methylation in newborns related to maternal smoking during pregnancy. Environ Health Perspect 120:1425–1431

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Joubert BR, Felix JF, Yousefi P et al (2016) DNA methylation in newborns and maternal smoking in pregnancy: genome-wide consortium meta-analysis. Am J Hum Genet 98:680–696

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Heyn H, Carmona FJ, Gomez A et al (2013) DNA methylation profiling in breast cancer discordant identical twins identifies DOK7 as novel epigenetic biomarker. Carcinogenesis 34:102–108

    Article  CAS  PubMed  Google Scholar 

  10. Shen J, Wang S, Zhang YJ et al (2013) Exploring genome-wide DNA methylation profiles altered in hepatocellular carcinoma using Infinium HumanMethylation 450 BeadChips. Epigenetics 8:34–43

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Heyn H, Li N, Ferreira HJ et al (2012) Distinct DNA methylomes of newborns and centenarians. Proc Natl Acad Sci U S A 109:10522–10527

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Liu Y, Aryee MJ, Padyukov L et al (2013) Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotechnol 31:142–147

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Engel SM, Joubert BR, Wu MC et al (2014) Neonatal genome-wide methylation patterns in relation to birth weight in the Norwegian mother and child cohort. Am J Epidemiol 179:834–842

    Article  PubMed  PubMed Central  Google Scholar 

  14. Pidsley R, Zotenko E, Peters TJ et al (2016) Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol 17:208

    Article  PubMed  PubMed Central  Google Scholar 

  15. GenomeStudio® Methylation Module v1.8 User Guide (2011) Illumina Inc., San Diego, CA, USA

    Google Scholar 

  16. Smith ML, Baggerly KA, Bengtsson H et al (2013) Illuminaio: an open source IDAT parsing tool for Illumina microarrays. F1000Res 2:264

    PubMed  PubMed Central  Google Scholar 

  17. R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

    Google Scholar 

  18. Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80

    Article  PubMed  PubMed Central  Google Scholar 

  19. Hansen KD (2016) IlluminaHumanMethylation450kanno.ilmn12.hg19: annotation for Illumina’s 450k methylation arrays. R package version 0.6.0

    Google Scholar 

  20. Hansen KD (2016) IlluminaHumanMethylationEPICanno.ilm10b2.hg19: annotation for Illumina’s EPIC methylation arrays. R package version 0.6.0

    Google Scholar 

  21. Aryee MJ, Jaffe AE, Corrada-Bravo H et al (2014) Minfi: a flexible and comprehensive bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30:1363–1369

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Chen YA, Lemire M, Choufani S et al (2013) Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 8:203–209

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Siva N (2008) 1000 Genomes project. Nat Biotechnol 26:256

    PubMed  Google Scholar 

  24. Zhou W, Laird PW, Shen H (2016) Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res 45(4):e22

    PubMed Central  Google Scholar 

  25. Maksimovic J, Gordon L, Oshlack A (2012) SWAN: subset-quantile within array normalization for illumina infinium HumanMethylation450 BeadChips. Genome Biol 13:R44

    Article  PubMed  PubMed Central  Google Scholar 

  26. Teschendorff AE, Marabita F, Lechner M et al (2013) A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 29:189–196

    Article  CAS  PubMed  Google Scholar 

  27. Touleimat N, Tost J (2012) Complete pipeline for Infinium((R)) human methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics 4:325–341

    Article  CAS  PubMed  Google Scholar 

  28. Triche TJ Jr, Weisenberger DJ, Van Den Berg D et al (2013) Low-level processing of illumina infinium DNA methylation BeadArrays. Nucleic Acids Res 41:e90

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Marabita F, Almgren M, Lindholm ME et al (2013) An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform. Epigenetics 8:333–346

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Wu MC, Joubert BR, Kuan PF et al (2014) A systematic assessment of normalization approaches for the infinium 450K methylation platform. Epigenetics 9:318–329

    Article  CAS  PubMed  Google Scholar 

  31. Dedeurwaerder S, Defrance M, Calonne E et al (2011) Evaluation of the Infinium methylation 450K technology. Epigenomics 3:771–784

    Article  CAS  PubMed  Google Scholar 

  32. Pidsley R, Y Wong CC, Volta M et al (2013) A data-driven approach to preprocessing illumina 450K methylation array data. BMC Genomics 14:293

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Morris TJ, Butcher LM, Feber A et al (2014) ChAMP: 450k Chip analysis methylation pipeline. Bioinformatics 30:428–430

    Article  CAS  PubMed  Google Scholar 

  34. Du P, Zhang X, Huang CC et al (2010) Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11:587

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Methodol 57:289–300

    Google Scholar 

  36. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25:25–29

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21:3587–3595

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Goeman JJ, Buhlmann P (2007) Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23:980–987

    Article  CAS  PubMed  Google Scholar 

  40. Wu MC, Lin X (2009) Prior biological knowledge-based approaches for the analysis of genome-wide expression profiles using gene sets and pathways. Stat Methods Med Res 18:577–593

    Article  PubMed  PubMed Central  Google Scholar 

  41. Subramanian A, Tamayo P, Mootha VK et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102:15545–15550

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Tomfohr J, Lu J, Kepler TB (2005) Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6:225

    Article  PubMed  PubMed Central  Google Scholar 

  43. Wu MC, Zhang L, Wang Z et al (2009) Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection. Bioinformatics 25:1145–1151

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Goeman JJ, van de Geer SA, de Kort F et al (2004) A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20:93–99

    Article  CAS  PubMed  Google Scholar 

  45. Kwee LC, Liu D, Lin X et al (2008) A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 82:386–397

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Liu D, Ghosh D, Lin X (2008) Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinformatics 9:292

    Article  PubMed  PubMed Central  Google Scholar 

  47. Liu D, Lin X, Ghosh D (2007) Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63:1079–1088

    Article  PubMed  PubMed Central  Google Scholar 

  48. Wu MC, Kraft P, Epstein MP et al (2010) Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929–942

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Baggerly KA, Morris JS, Edmonson SR et al (2005) Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. J Natl Cancer Inst 97:307–309

    Article  CAS  PubMed  Google Scholar 

  50. Petricoin EF, Ardekani AM, Hitt BA et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577

    Article  CAS  PubMed  Google Scholar 

  51. Ransohoff DF (2005) Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer Inst 97:315–319

    Article  CAS  PubMed  Google Scholar 

  52. Troyanskaya O, Cantor M, Sherlock G et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525

    Article  CAS  PubMed  Google Scholar 

  53. Bell JT, Pai AA, Pickrell JK et al (2011) DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol 12:R10

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8:118–127

    Article  PubMed  Google Scholar 

  55. Leek JT, Johnson WE, Parker HS et al (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Huber PJ (1973) Robust regression: asymptotics, conjectures and Monte Carlo. Ann Stat 1:799–821

    Article  Google Scholar 

  57. Venables WN, Ripley BD (2013) Modern applied statistics with S-PLUS. Springer, Berlin

    Google Scholar 

  58. Zeileis A, Hothorn T (2002) Diagnostic checking in regression relationships. R News 2: 7–10.

    Google Scholar 

  59. Zeileis A (2004) Econometric computing with HC and HAC covariance matrix estimators. Journal of Statistical Software 11:1–17.

    Google Scholar 

  60. Zeileis A (2006) Object-oriented computation of sandwich estimators. Journal of Statistical Software 16:1–16.

    Google Scholar 

  61. Peng B, RK Y, Dehoff KL et al (2007) Normalizing a large number of quantitative traits using empirical normal quantile transformation. BMC Proc 1(Suppl 1):S156

    Article  PubMed  PubMed Central  Google Scholar 

  62. Efron B (2007) Size, power and false discovery rates. Ann Stat 35:1351–1377

    Article  Google Scholar 

  63. Houseman EA, Accomando WP, Koestler DC et al (2012) DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 13:86

    Article  PubMed  PubMed Central  Google Scholar 

  64. Reinius LE, Acevedo N, Joerink M et al (2012) Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS One 7:e41361

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Houseman EA, Molitor J, Marsit CJ (2014) Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics 30:1431–1439

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Zou J, Lippert C, Heckerman D et al (2014) Epigenome-wide association studies without the need for cell-type composition. Nat Methods 11:309–311

    Article  CAS  PubMed  Google Scholar 

  67. Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2:e190

    Article  PubMed  PubMed Central  Google Scholar 

  68. Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909

    Article  CAS  PubMed  Google Scholar 

  69. Barfield RT, Almli LM, Kilaru V et al (2014) Accounting for population stratification in DNA methylation studies. Genet Epidemiol 38:231–241

    Article  PubMed  PubMed Central  Google Scholar 

  70. Leek JT, Scharpf RB, Bravo HC et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739

    Article  CAS  PubMed  Google Scholar 

  71. Leek JT, Storey JD (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3:1724–1735

    Article  CAS  PubMed  Google Scholar 

  72. Leek JT, Storey JD (2008) A general framework for multiple testing dependence. Proc Natl Acad Sci U S A 105:18718–18723

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Good P (2010) Exchangeable random variables. Bioinformatics 26:2214. author reply 2215

    Article  CAS  PubMed  Google Scholar 

  74. Hannum G, Guinney J, Zhao L et al (2013) Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell 49:359–367

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael C. Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Wu, M.C., Kuan, PF. (2018). A Guide to Illumina BeadChip Data Analysis. In: Tost, J. (eds) DNA Methylation Protocols. Methods in Molecular Biology, vol 1708. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7481-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7481-8_16

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7479-5

  • Online ISBN: 978-1-4939-7481-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics