Biostatistical Methods in Epigenetic Epidemiology

  • E. Andrés HousemanEmail author


Epidemiologic studies involving epigenetics have, to date, focused mostly on DNA methylation, though studies investigating other epigenetic measures such as loss-of-imprinting and micro-RNAs are becoming increasingly popular. Analysis of data arising from such studies can be complicated by issues such as multiple comparison or lack of clarity about the goals of analysis. In this chapter, we provide an overview of key considerations in the analysis of such data sets, as well as a review of modern statistical techniques that have been used in the analysis of epigenetic data. We distinguish between exploratory studies and confirmation studies, providing guidance on statistical analysis in each case. In particular, we indicate when the control of familywise error rate or of false discovery rate is to be preferred. We provide an overview of unsupervised and supervised multivariate analysis, guidance on the analysis of microarray data, as well as guidance on study design considerations.


Beta Distribution LINE1 Methylation Methylation Variable Unsupervised Analysis Supervise Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

List of Abbreviations


analysis of variance


classification and regression tree


chromatin immunoprecipitation with combined with microarray technology


CpG Island Methylator Phenotype


combined bisulfite restriction analysis


factor analysis


false discovery rate


familywise error rate


genome-wide association studies


hierarchical ordered partitioning and collapsing hybrid


linear discriminant analysis


long interspersed nuclear elements


loss of imprinting




missing completely at random


methyl-DNA immunoprecipitation






methylation-specific PCR


principal components analysis


Polymerase chain reaction


recursively partitioned mixture model


significance analysis of microarrays


single-nucleotide polymorphism


support vector machine


  1. 1.
    Park PJ (2009) ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 10:669–680PubMedCrossRefGoogle Scholar
  2. 2.
    Pradervand SJ, Weber J, Thomas J, Bueno M, Wirapati P, Lefort K et al (2009) Impact of normalization on miRNA microarray expression profiling. RNA 15:493–501PubMedCrossRefGoogle Scholar
  3. 3.
    Tricoli JV, Jacobson JW (2007) MicroRNA: potential for cancer detection, diagnosis, and prognosis. Cancer Res 67:4553–4555PubMedCrossRefGoogle Scholar
  4. 4.
    Laird P (2010) Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genet 11:191–203PubMedCrossRefGoogle Scholar
  5. 5.
    Herman JG, Graff JR, Myohanen S, Nelkin BD, Baylin SB (1996) Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands. Proc Natl Acad Sci USA 93:9821–9826PubMedCrossRefGoogle Scholar
  6. 6.
    Xiong Z, Laird PW (1997) COBRA: a sensitive and quantitative DNA methylation assay. Nucleic Acids Res 25:2532–2534PubMedCrossRefGoogle Scholar
  7. 7.
    Tost J, El abdalaoui H, Gut IG (2006) Serial pyrosequencing for quantitative DNA methylation analysis. Biotechniques 40:721–722, 724, 726Google Scholar
  8. 8.
    Eads CA, Danenberg KD, Kawakami K, Saltz LB, Blake C, Shibata D et al (2000) MethyLight: a high-throughput assay to measure DNA methylation. Nucleic Acids Res 28:E32PubMedCrossRefGoogle Scholar
  9. 9.
    Jurinke C, Denissenko MF, Oeth P, Ehrich M, van den Boom D, Cantor CR (2005) A single nucleotide polymorphism based approach for the identification and characterization of gene expression modulation using MassARRAY. Mutat Res 573:83–95PubMedCrossRefGoogle Scholar
  10. 10.
    Bibikova M, Fan JB (2009) GoldenGate assay for DNA methylation profiling. Methods Mol Biol 507:149–163PubMedCrossRefGoogle Scholar
  11. 11.
    Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou L, Shen R et al (2009) Genome-wide DNA methylation profiling using Infinium assay. Epigenomics 1:177–200PubMedCrossRefGoogle Scholar
  12. 12.
    Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B et al (2006) High-throughput DNA methylation profiling using universal bead arrays. Genome Res 16:383–393PubMedCrossRefGoogle Scholar
  13. 13.
    Shen Y, Fouse SD, Fan G (2009) Genome-wide DNA methylation profiling: the mDIP-chip technology. Methods Mol Biol 568:203–216PubMedCrossRefGoogle Scholar
  14. 14.
    Houseman EA, Christensen BC, Marsit CJ, Karagas MR, Wrensch MR, Yeh RF et al (2008) Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 9:365Google Scholar
  15. 15.
    Hsiung DT, Marsit CJ, Houseman EA, Eddy K, Furniss CS, McClean MD et al (2007) Global DNA methylation level in whole blood as a biomarker in head and neck squamous cell carcinoma. Cancer Epidemiol Biomarkers Prev 16:108–114PubMedCrossRefGoogle Scholar
  16. 16.
    Houseman EA, Christensen BC, Karagas MR, Wrensch MR, Nelson HH, Wiemels JL et al (2009) Copy number variation has little impact on bead-array-based measures of DNA methylation. Bioinformatics 25:1999–2005PubMedCrossRefGoogle Scholar
  17. 17.
    Lambertini L, Diplas AI, Lee MJ, Sperling R, Chen J, Wetmur J (2008) A sensitive functional assay reveals frequent loss of genomic imprinting in human placenta. Epigenetics 3:261–269PubMedCrossRefGoogle Scholar
  18. 18.
    Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, HobokenGoogle Scholar
  19. 19.
    Baker SG, Kramer BS (2008) Using microarrays to study the microenvironment in tumor biology: the crucial role of statistics. Semin Cancer Biol 18:305–310PubMedCrossRefGoogle Scholar
  20. 20.
    Siegmund KD, Laird PW (2002) Analysis of complex methylation data. Methods 27:170–178PubMedCrossRefGoogle Scholar
  21. 21.
    Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New YorkGoogle Scholar
  22. 22.
    Storey J, Tibshirani R (2003) Statistical significance for genome-wide experiments. Proc Natl Acad Sci USA 100:9440–9445PubMedCrossRefGoogle Scholar
  23. 23.
    Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B 64:479–498CrossRefGoogle Scholar
  24. 24.
    Davidsson J, Lilljebjorn H, Andersson A, Veerla S, Heldrup J, Behrendtz M et al (2009) The DNA methylome of pediatric acute lymphoblastic leukemia. Hum Mol Genet 18:4054–4065PubMedCrossRefGoogle Scholar
  25. 25.
    Sun YV, Turner ST, Smith JA, Hammond PI, Lazarus A, Van De Rostyne JL et al (2010) Comparison of the DNA methylation profiles of human peripheral blood cells and transformed B-lymphocytes. Hum Genet 127:651–658PubMedCrossRefGoogle Scholar
  26. 26.
    Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Gayther SA, Apostolidou S et al (2009) An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS One 4:e8274PubMedCrossRefGoogle Scholar
  27. 27.
    Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method, 2nd edn. Butterworths, LondonGoogle Scholar
  28. 28.
    Bartholomew DJ, Knott M (1999) Latent variable models and factor analysis, 2nd edn. Hodder Arnold, LondonGoogle Scholar
  29. 29.
    Houseman EA, Marsit C, Karagas M, Ryan LM (2007) Penalized item response theory models: application to epigenetic alterations in bladder cancer. Biometrics 63:1269–1277PubMedCrossRefGoogle Scholar
  30. 30.
    Marsit CJ, Houseman EA, Schned AR, Karagas MR, Kelsey KT (2007) Promoter hypermethylation is associated with current smoking, age, gender and survival in bladder cancer. Carcinogenesis 28:1745–1751PubMedCrossRefGoogle Scholar
  31. 31.
    Hastie T, Stuetzle W (1989) Principal Curves. J Am Stat Assoc 84:502–516CrossRefGoogle Scholar
  32. 32.
    Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631CrossRefGoogle Scholar
  33. 33.
    Siegmund KD, Laird PW, Laird-Offringa IA (2004) A comparison of cluster analysis methods using DNA methylation data. Bioinformatics 20:1896–1904PubMedCrossRefGoogle Scholar
  34. 34.
    Christensen BC, Houseman EA, Marsit CJ, Zheng S, Wrensch MR, Wiemels JL et al (2009) Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CpG island context. PLoS Genet 5:e1000602PubMedCrossRefGoogle Scholar
  35. 35.
    Christensen BC, Kelsey KT, Zheng S, Houseman EA, Marsit CJ, Wrensch MR et al (2010) Breast cancer DNA methylation profiles are associated with tumor size and alcohol and folate intake. PLoS Genet 6:e1001043PubMedCrossRefGoogle Scholar
  36. 36.
    Christensen BC, Marsit CJ, Houseman EA, Godleski JJ, Longacker JL, Zheng S et al (2009) Differentiation of lung adenocarcinoma, pleural mesothelioma, and nonmalignant pulmonary tissues using DNA methylation profiles. Cancer Res 69:6315–6321PubMedCrossRefGoogle Scholar
  37. 37.
    Marsit CJ, Christensen BC, Houseman EA, Karagas MR, Wrensch MR, Yeh RF et al (2009) Epigenetic profiling reveals etiologically distinct patterns of DNA methylation in head and neck squamous cell carcinoma. Carcinogenesis 30:416–422PubMedCrossRefGoogle Scholar
  38. 38.
    Shen L, Toyota M, Kondo Y, Lin E, Zhang L, Guo Y, Hernandez NS, Chen X, Ahmed S, Konishi K, Hamilton S R, Issa JP (2007). Integrated genetic and epigenetic analysis identifies three different subclasses of colon cancer, PNAS 104:18654–18659CrossRefGoogle Scholar
  39. 39.
    Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244CrossRefGoogle Scholar
  40. 40.
    Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New YorkCrossRefGoogle Scholar
  41. 41.
    van der Laan MJ, Pollard KS (2003) A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. J Stat Plan Infer 117:275–303CrossRefGoogle Scholar
  42. 42.
    Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R. Bioinformatics 24:719–720PubMedCrossRefGoogle Scholar
  43. 43.
    Koestler DC, Marsit CJ, Christensen BC, Karagas MR, Bueno R, Sugarbaker DJ et al (2010) Semi-supervised recursively partitioned mixture models for identifying cancer subtypes. Bioinformatics. doi: 10.1093/bioinformatics/btq470
  44. 44.
    Marsit CJ, Houseman EA, Christensen BC, Eddy K, Bueno R, Sugarbaker DJ et al (2006) Examination of a CpG island methylator phenotype and implications of methylation profiles in solid tumors. Cancer Res 66:10621–10629PubMedCrossRefGoogle Scholar
  45. 45.
    Issa JP (2004) CpG island methylator phenotype in cancer. Nat Rev Cancer 4:988–993PubMedCrossRefGoogle Scholar
  46. 46.
    Toyota M, Ahuja N, Ohe-Toyota M, Herman JG, Baylin SB, Issa JP (1999) CpG island methylator phenotype in colorectal cancer. Proc Natl Acad Sci USA 96:8681–8686PubMedCrossRefGoogle Scholar
  47. 47.
    Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero–one loss. Mach Lear 29:103–137CrossRefGoogle Scholar
  48. 48.
    Vapnik V, Kotz S (2006) Estimation of dependences based on empirical data. Springer, New YorkGoogle Scholar
  49. 49.
    Liu D, Lin X, Ghosh D (2007) Semiparametric regression of multidimensional genetic pathway data: Least-squares kernel machines and linear mixed models. Biometrics 63:1079–1088PubMedCrossRefGoogle Scholar
  50. 50.
    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall, Boca RatonGoogle Scholar
  51. 51.
    Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefGoogle Scholar
  52. 52.
    Smyth GK, Yang Y-H, Speed TP (2003) Statistical issues in microarray data analysis. Methods Mol Biol 224:111–136PubMedGoogle Scholar
  53. 53.
    Aryee MJ, Wu Z, Ladd-Acosta C, Herb B, Feinberg AP, Yegnasubramanian S et al (2011) Accurate genome-scale percentage DNA methylation estimates from microarray data. Biostatistics 12: 197-210.Google Scholar
  54. 54.
    Bair E, Hastie T, Paul D, Tibshirani R (2006) Prediction by supervised principal components. J Am Stat Assoc 101:119–137CrossRefGoogle Scholar
  55. 55.
    Bair E, Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2:1544–9173CrossRefGoogle Scholar
  56. 56.
    Doi A, Park IH, Wen B, Murakami P, Aryee MJ, Irizarry R et al (2009) Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat Genet 41:1350–1353PubMedCrossRefGoogle Scholar
  57. 57.
    Kramer CY (1956) Extension of multiple range tests to group means with unequal numbers of replications. Biometrics 12:307–310CrossRefGoogle Scholar
  58. 58.
    Scheffé H (1953) A method for judging all contrasts in the analysis of variance. Biometrika 40:87–104Google Scholar
  59. 59.
    Duncan DB (1975) Tests and intervals for comparisons suggested by the data. Biometrics 31:339–359CrossRefGoogle Scholar
  60. 60.
    Holm S (1979) A simple sequentially rejective multiple test procedure. Scan J Stat 6:65–70Google Scholar
  61. 61.
    Simes RJ (1986) An improved Bonferroni procedure for multiple tests of significance. Biometrika 73:751–754CrossRefGoogle Scholar
  62. 62.
    Westfall PH, Young SS (1993) Resampling-based multiple testing: examples and methods for p-value adjustment. Wiley, New YorkGoogle Scholar
  63. 63.
    Storey JD (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Ann Stat 31:2013–2035CrossRefGoogle Scholar
  64. 64.
    Wang S-C, Petronis A (2008) DNA methylation microarrays: experimental design and statistical analysis. Chapman & Hall, Boca RatonCrossRefGoogle Scholar
  65. 65.
    Simon RM, Korn EL, Simon RM, Korn EL, McShane LM, Radmacher MD et al (2004) Design and analysis of DNA microarray investigations. Springer, New YorkGoogle Scholar
  66. 66.
    Tusher V, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98:5116–5121PubMedCrossRefGoogle Scholar
  67. 67.
    Page GP, Edwards JW, Gadbury GL, Yelisetti P, Wang J, Trivedi P et al (2006) The PowerAtlas: a power and sample size atlas for microarray experimental design and research. BMC Bioinformatics 7:84PubMedCrossRefGoogle Scholar
  68. 68.
    Liu P, Hwang JT (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23:739–746PubMedCrossRefGoogle Scholar
  69. 69.
    Irizarry RA, Ladd-Acosta C, Carvalho B, Wu H, Brandenburg SA, Jeddeloh JA, Wen B, Feinberg AP (2008). Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Res. 18:780–790PubMedCrossRefGoogle Scholar
  70. 70.
    Kuan PF, Wang S, Zhou X, and Chu H (2010). A statistical framework for Illumina DNA methylation arrays. Bioinformatics 26:2849–2855PubMedCrossRefGoogle Scholar
  71. 71.
    Benjamini Y, Hochberg Y, (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc Ser B 57:289–300PubMedCrossRefGoogle Scholar
  72. 72.
    Teschendorff AE, Zhuang J, Widschwendter M (2011). Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics 27:1496–1505PubMedCrossRefGoogle Scholar
  73. 73.
    Hinoue T, Weisenberger DJ, Lange CPE, Byun HM, Van Den Berg D, Malik S, Pan F, Noushmehr H, van Dijk CM, Tollenaar RA, Laird PW (2011). Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Res. in pressPubMedCrossRefGoogle Scholar
  74. 74.
    Langevin SM, Houseman EA, Christensen BC, Wiencke JK, Nelson HH, Karagas MR, Marsit CJ, Kelsey KT (2011). The influence of aging, environmental exposures and local sequence features on the variation of DNA methylation in blood. Epigenetics 6:908–19PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.College Of Public Health and Human SciencesOregon State UniversityCorvallisUSA

Personalised recommendations