Skip to main content

Advertisement

Log in

Statistical approaches for the analysis of DNA methylation microarray data

  • Review Paper
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Following the rapid development and adoption in DNA methylation microarray assays, we are now experiencing a growth in the number of statistical tools to analyze the resulting large-scale data sets. As is the case for other microarray applications, biases caused by technical issues are of concern. Some of these issues are old (e.g., two-color dye bias and probe- and array-specific effects), while others are new (e.g., fragment length bias and bisulfite conversion efficiency). Here, I highlight characteristics of DNA methylation that suggest standard statistical tools developed for other data types may not be directly suitable. I then describe the microarray technologies most commonly in use, along with the methods used for preprocessing and obtaining a summary measure. I finish with a section describing downstream analyses of the data, focusing on methods that model percentage DNA methylation as the outcome, and methods for integrating DNA methylation with gene expression or genotype data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Agius P, Campbell C (2009) Bayesian unsupervised learning with multiple data types bayesian unsupervised learning with multiple data types. Statistical applications in genetics and molecular biology 8: Article 27

  • Aryee MJ, Wu Z, Ladd-Acosta C, Herb B, Feinberg AP, Yegnasubramanian S, Irizarry RA (2011) Accurate genome-scale percentage DNA methylation estimates from microarray data. Biostatistics 12(2):197–210

    Google Scholar 

  • Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, Gilad Y, Pritchard JK (2011) DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol 12:R10

    Article  PubMed  CAS  Google Scholar 

  • Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D, Thomas NJ, Wang Y, Vollmer E, Goldmann T, Seifart C, Jiang W, Barker DL, Chee MS, Floros J, Fan J-B (2006) High-throughput DNA methylation profiling using universal bead arrays. Genome Res 16:383–393

    Article  PubMed  CAS  Google Scholar 

  • Bird A (2002) DNA methylation patterns and epigenetic memory. Genes Dev 16:6–21

    Article  PubMed  CAS  Google Scholar 

  • Bock C, Tomazou EM, Brinkman AB, Müller F, Simmer F, Gu H, Jäger N, Gnirke A, Stunnenberg HG, Meissner A (2010) Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28:1106–1114

    Article  PubMed  CAS  Google Scholar 

  • Chavez L, Jozefczuk J, Grimm C, Dietrich J, Timmermann B, Lehrach H, Herwig R, Adjaye J (2010) Computational analysis of genome-wide DNA methylation during the differentiation of human embryonic stem cells along the endodermal lineage. Genome Res 20:1441–1450

    Article  PubMed  CAS  Google Scholar 

  • Coarfa C, Yu F, Miller CA, Chen Z, Harris RA, Milosavljevic A (2010) Pash 3.0: a versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics 11:572

    Article  PubMed  Google Scholar 

  • Down TA, Rakyan VK, Turner DJ, Flicek P, Li H, Kulesha E, Graf S, Johnson N, Herrero J, Tomazou EM, Thorne NP, Backdahl L, Herberth M, Howe KL, Jackson DK, Miretti MM, Marioni JC, Birney E, Hubbard TJ, Durbin R, Tavare S, Beck S (2008) A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol 26:779–785

    Article  PubMed  CAS  Google Scholar 

  • Du P, Kibbe Wa, Lin SM (2008) lumi: a pipeline for processing Illumina microarray. Bioinformatics (Oxford, England) 24:1547–1548

    Article  CAS  Google Scholar 

  • Du P, Zhang X, Huang C-C, Jafari N, Kibbe WA, Hou L, Lin SM (2010) Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11:587

    Article  PubMed  CAS  Google Scholar 

  • Dunning MJ, Smith ML, Ritchie ME, Tavare S (2007) beadarray: R classes and methods for Illumina bead-based data. Bioinformatics 23:2183–2184

    Article  PubMed  CAS  Google Scholar 

  • Dunning MJ, Barbosa-Morais NL, Lynch AG, Tavaré S, Ritchie ME (2008) Statistical issues in the analysis of Illumina data. BMC Bioinformatics 9:85

    Article  PubMed  Google Scholar 

  • Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, Cox TV, Davies R, Down TA, Haefliger C, Horton R, Howe K, Jackson DK, Kunde J, Koenig C, Liddle J, Niblett D, Otto T, Pettett R, Seemann S, Thompson C, West T, Rogers J, Olek A, Berlin K, Beck S (2006) DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet 38:1378–1385

    Article  PubMed  CAS  Google Scholar 

  • Ferrari S, Cribari-Neto F (2004) Beta regression for modelling rates and proportions. J Appl Stat 31:799–815

    Article  Google Scholar 

  • Fuke C, Shimabukuro M, Petronis A, Sugimoto J, Oda T, Miura K, Miyazaki T, Ogura C, Okazaki Y, Jinno Y (2004) Age related changes in 5-methylcytosine content in human peripheral leukocytes and placentas: an HPLC-based study. Ann Hum Genet 68:196–204

    Article  PubMed  CAS  Google Scholar 

  • Harris RA, Wang T, Coarfa C, Nagarajan RP, Hong C, Downey SL, Johnson BE, Fouse SD, Delaney A, Zhao Y, Olshen A, Ballinger T, Zhou X, Forsberg KJ, Gu J, Echipare L, O’Geen H, Lister R, Pelizzola M, Xi Y, Epstein CB, Bernstein BE, Hawkins RD, Ren B, Chung W-Y, Gu H, Bock C, Gnirke A, Zhang MQ, Haussler D, Ecker JR, Li W, Farnham PJ, Waterland RA, Meissner A, Marra MA, Hirst M, Milosavljevic A, Costello JF (2010) Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat Biotechnol 28:1097–1105

    Article  PubMed  CAS  Google Scholar 

  • Houseman EA, Christensen BC, Yeh R-F, Marsit CJ, Karagas MR, Wrensch M, Nelson HH, Wiemels J, Zheng S, Wiencke JK, Kelsey KT (2008) Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9:365

    Article  PubMed  Google Scholar 

  • Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4:249–264

    Article  PubMed  Google Scholar 

  • Irizarry RA, Ladd-Acosta C, Carvalho B, Wu H, Brandenburg SA, Jeddeloh JA, Wen B, Feinberg AP (2008) Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Res 18:780–790

    Article  PubMed  CAS  Google Scholar 

  • Jeong J, Li L, Liu Y, Nephew KP, Huang TH-M, Shen C (2010) An empirical Bayes model for gene expression and methylation profiles in antiestrogen resistant breast cancer. BMC Medical Genomics 3:55

    Article  PubMed  CAS  Google Scholar 

  • Ji H, Ehrlich LI, Seita J, Murakami P, Doi A, Lindau P, Lee H, Aryee MJ, Irizarry RA, Kim K, Rossi DJ, Inlay MA, Serwold T, Karsunky H, Ho L, Daley GQ, Weissman IL, Feinberg AP (2010) Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature 467:338–342

    Article  PubMed  CAS  Google Scholar 

  • Johnson WE, Li W, Meyer Ca, Gottardo R, Carroll JS, Brown M, Liu XS (2006) Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci USA 103:12457–12462

    Article  PubMed  CAS  Google Scholar 

  • Jones PA, Baylin SB (2007) The epigenomics of cancer. Cell 128:683–692

    Article  PubMed  CAS  Google Scholar 

  • Kelly TK, De Carvalho DD, Jones PA (2010) Epigenetic modifications as therapeutic targets. Nat Biotechnol 28:1069–1078

    Article  PubMed  CAS  Google Scholar 

  • Khalili A, Huang T, Lin S (2009) A robust unified approach to analyzing methylation and gene expression data. Comput Stat Data Anal 53:1701–1710

    Article  PubMed  Google Scholar 

  • Kim RS, Lin J (2011) Multi-level mixed effects models for bead arrays. Bioinformatics 27(5):633–640

    Google Scholar 

  • Koestler DC, Marsit CJ, Christensen BC, Karagas MR, Bueno R, Sugarbaker DJ, Kelsey KT, Houseman EA (2010) Semi-supervised recursively partitioned mixture models for identifying cancer subtypes. Bioinformatics 26:2578–2585

    Article  PubMed  CAS  Google Scholar 

  • Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645

    Article  PubMed  CAS  Google Scholar 

  • Kuan PF, Wang S, Zhou X, Chu H (2010) A statistical framework for Illumina DNA methylation arrays. Bioinformatics 26:2849–2855

    Article  PubMed  CAS  Google Scholar 

  • Laird PW (2003) The power and the promise of DNA methylation markers. Nat Rev Cancer 3:253–266

    Article  PubMed  CAS  Google Scholar 

  • Laird PW (2010) Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genetics 11:191–203

    Article  CAS  Google Scholar 

  • Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Ra Irizarry (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genetics 11:733–739

    Article  CAS  Google Scholar 

  • Li Y, Zhu J, Tian G, Li N, Li Q, Ye M, Zheng H, Yu J, Wu H, Sun J, Zhang H, Chen Q, Luo R, Chen M, He Y, Jin X, Zhang Q, Yu C, Zhou G, Sun J, Huang Y, Zheng H, Cao H, Zhou X, Guo S, Hu X, Li X, Kristiansen K, Bolund L, Xu J, Wang W, Yang H, Wang J, Li R, Beck S, Wang J, Zhang X (2010) The DNA methylome of human peripheral blood mononuclear cells. PLoS Biol 8:e1000533

    Article  PubMed  Google Scholar 

  • Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo Q-M, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462:315–322

    Article  PubMed  CAS  Google Scholar 

  • Loss LA, Sadanandam A, Durinck S, Nautiyal S, Flaucher D, Carlton VEH, Moorhead M, Lu Y, Gray JW, Faham M, Spellman P, Parvin B (2010) Prediction of epigenetically regulated genes in breast cancer cell lines. BMC Bioinformatics 11:305

    Article  PubMed  Google Scholar 

  • Lynch AG, Dunning MJ, Iddawela M, Barbosa-Morais NL, Ritchie ME (2009) Considerations for the processing and analysis of GoldenGate-based two-colour Illumina platforms. Stat Methods Med Res 18:437–452

    Article  PubMed  CAS  Google Scholar 

  • Marsit CJ, Christensen BC, Houseman EA, Karagas MR, Wrensch MR, Yeh RF, Nelson HH, Wiemels JL, Zheng S, Posner MR, McClean MD, Wiencke JK, Kelsey KT (2009) Epigenetic profiling reveals etiologically distinct patterns of DNA methylation in head and neck squamous cell carcinoma. Carcinogenesis 30:416–422

    Article  PubMed  CAS  Google Scholar 

  • Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman BP, Pan F, Pelloski CE, Sulman EP, Bhat KP, Verhaak RGW, Hoadley KA, Hayes DN, Perou CM, Schmidt HK, Ding L, Wilson RK, Van Den Berg D, Shen H, Bengtsson H, Neuvial P, Cope LM, Buckley J, Herman JG, Baylin SB, Laird PW, Aldape K (2010) Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell 17:510–522

    Article  PubMed  CAS  Google Scholar 

  • Oda M, Glass JL, Thompson RF, Mo Y, Olivier EN, Figueroa ME, Selzer RR, Richmond TA, Zhang X, Dannenberg L, Green RD, Melnick A, Hatchwell E, Bouhassira EE, Verma A, Suzuki M, Greally JM (2009) High-resolution genome-wide cytosine methylation profiling with simultaneous copy number analysis and optimization for limited cell numbers. Nucleic Acids Res 37:3829–3839

    Article  PubMed  CAS  Google Scholar 

  • Ordway JM, Curran T (2002) Methylation matters: modeling a manageable genome. Cell Growth Differ 13:149–162

    PubMed  CAS  Google Scholar 

  • Ordway JM, Bedell JA, Citek RW, Nunberg A, Garrido A, Kendall R, Stevens JR, Cao D, Doerge RW, Korshunova Y, Holemon H, McPherson JD, Lakey N, Leon J, Martienssen RA, Jeddeloh JA (2006) Comprehensive DNA methylation profiling in a human cancer genome identifies novel epigenetic targets. Carcinogenesis 27:2409–2423

    Article  PubMed  CAS  Google Scholar 

  • Parkhomenko E, Tritchler D, Beyene J (2007) Genome-wide sparse canonical correlation of gene expression with genotypes. BMC Proc 1(Suppl 1):S119

    Article  PubMed  Google Scholar 

  • Parkhomenko E, Tritchler D, Beyene J (2009) Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol 8:1

    Google Scholar 

  • Pelizzola M, Koga Y, Urban AE, Krauthammer M, Weissman S, Halaban R, Molinaro AM (2008) MEDME: an experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIP-enrichment. Genome Res 18:1652–1659

    Article  PubMed  CAS  Google Scholar 

  • Portela A, Esteller M (2010) Epigenetic modifications and human disease. Nat Biotechnol 28:1057–1068

    Article  PubMed  CAS  Google Scholar 

  • Potter DP, Yan P, Huang THM, Lin S (2008) Probe signal correction for differential methylation hybridization experiments. BMC Bioinformatics 9:453

    Article  PubMed  Google Scholar 

  • Rauch T, Li H, Wu X, Pfeifer GP (2006) MIRA-assisted microarray analysis, a new technology for the determination of DNA methylation patterns, identifies frequent methylation of homeodomain-containing genes in lung cancer cells. Cancer Res 66:7939–7947

    Article  PubMed  CAS  Google Scholar 

  • Robinson MD, McCarthy DJ, Smyth GK (2010a) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140

    Article  PubMed  CAS  Google Scholar 

  • Robinson MD, Stirzaker C, Statham AL, Coolen MW, Song JZ, Nair SS, Strbenac D, Speed TP, Clark SJ (2010b) Evaluation of affinity-based genome-wide DNA methylation data: effects of CpG density, amplification bias, and copy number variation. Genome Res 20:1719–1729

    Article  PubMed  CAS  Google Scholar 

  • Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25:2906–2912

    Article  PubMed  CAS  Google Scholar 

  • Shi W, Oshlack A, Smyth GK (2010) Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips. Nucleic Acids Res 38:e204

    Article  PubMed  Google Scholar 

  • Siegmund KD, Lin S (2007) Epigenetics. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, vol 2, 3rd edn. Wiley and Sons, Chichester, pp 1301–1317

    Chapter  Google Scholar 

  • Silver JD, Ritchie ME, Smyth GK (2009) Microarray background correction: maximum likelihood estimation for the normal-exponential convolution. Biostatistics 10:352–363

    Article  PubMed  Google Scholar 

  • Song JS, Johnson WE, Zhu X, Zhang X, Li W, Manrai AK, Liu JS, Chen R, Liu XS (2007) Model-based analysis of two-color arrays (MA2C). Genome Biol 8:R178

    Article  PubMed  Google Scholar 

  • Statham AL, Strbenac D, Coolen MW, Stirzaker C, Clark SJ, Robinson MD (2010) Repitools: an R package for the analysis of enrichment-based epigenomic data. Bioinformatics 26:1662–1663

    Article  PubMed  CAS  Google Scholar 

  • Strachan T, Read AP (1999) Human molecular genetics, 2nd edn. Wiley-Liss, New York

    Google Scholar 

  • Sun S, Yan PS, Huang THM, Lin S (2009) Identifying differentially methylated genes using mixed effect and generalized least square models. BMC Bioinformatics 10:404

    Article  PubMed  Google Scholar 

  • Task E, Board SA (2008) Moving AHEAD with an international human epigenome project. Nature 454:711–715

    Article  Google Scholar 

  • Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Gayther SA, Apostolidou S, Jones A, Lechner M, Beck S, Jacobs IJ, Widschwendter M (2009) An epigenetic signature in peripheral blood predicts active ovarian cancer. PloS One 4:e8274

    Article  PubMed  Google Scholar 

  • Thompson RF, Reimers M, Khulan B, Gissot M, Richmond TA, Chen Q, Zheng X, Kim K, Greally JM (2008) An analytical pipeline for genomic representations used for cytosine methylation studies. Bioinformatics 24:1161–1167

    Article  PubMed  CAS  Google Scholar 

  • Tycko B (2010) Allele-specific DNA methylation: beyond imprinting. Hum Mol Genet 19:210–220

    Article  Google Scholar 

  • van der Laan MJ, Pollard KS (2003) Hybrid clustering of gene expression data with visualization and the bootstrap. J Stat Plan Inference 117:275–303

    Article  Google Scholar 

  • Wang XM, Greiner TC, Bibikova M, Pike BL, Siegmund KD, Sinha UK, Muschen M, Jaeger EB, Weisenburger DD, Chan WC, Shibata D, Fan JB, Hacia JG (2010) Identification and functional relevance of de novo DNA methylation in cancerous B-cell populations. J Cell Biochem 109:818–827

    PubMed  CAS  Google Scholar 

  • Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, Lam WL, Schübeler D (2005) Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat Genet 37:853–862

    Article  PubMed  CAS  Google Scholar 

  • Witten DM, Tibshirani RJ (2009) Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical applications in genetics and molecular biology. 8:28

  • Wolff EM, Chihara Y, Pan F, Weisenberger DJ, Siegmund KD, Sugano K, Kawashima K, Laird PW, Jones PA, Liang G (2010) Unique DNA methylation patterns distinguish noninvasive and invasive urothelial cancers and establish an epigenetic field defect in premalignant tissue. Cancer Res 70:8169–8178

    Article  PubMed  CAS  Google Scholar 

  • Wu Z, Aryee MJ (2010) Subset quantile normalization using negative control features. J Comput Biol 17:1267–1277

    Article  CAS  Google Scholar 

  • Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F (2004) A model-based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 99:909–917

    Article  Google Scholar 

  • Xie Y, Wang X, Story M (2009) Statistical methods of background correction for Illumina BeadArray data. Bioinformatics 25:751–757

    Article  PubMed  CAS  Google Scholar 

  • Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30:e15

    Article  PubMed  Google Scholar 

  • Zhang D, Cheng L, Badner JA, Chen C, Chen Q, Luo W, Craig DW, Redman M, Gershon ES, Liu C (2010) Genetic control of individual differences in gene-specific methylation in human brain. Am J Hum Genet 86:411–419

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

I would like to thank Dr. Joe Hacia for his comments on an early draft and Dr. Christina Curtis for discussions regarding methods for data integration. I would also like to thank Tim Triche Jr. for his work on Beta Regression and the preprocessing of DNA methylation data from Illumina’s Infinium platform, and Dr. Peter W. Laird for the many helpful discussions over the years. This work was supported by NCI grant number R01 CA097346 and NIEHS grant number P30 ES07048. The content is solely the responsibility of the author and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kimberly D. Siegmund.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Siegmund, K.D. Statistical approaches for the analysis of DNA methylation microarray data. Hum Genet 129, 585–595 (2011). https://doi.org/10.1007/s00439-011-0993-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-011-0993-x

Keywords

Navigation