Correcting for Sample Heterogeneity in Methylome-Wide Association Studies

Part of the Methods in Molecular Biology book series (MIMB, volume 1589)


Epigenome-wide association studies (EWAS) face many of the same challenges as genome-wide association studies (GWAS), but have an added challenge in that the epigenome can vary dramatically across cell types. When cell-type composition differs between cases and controls, this leads to spurious associations that may obscure true associations. We have developed a computational method, FaST-LMM-EWASher, which automatically corrects for cell-type composition without needing explicit knowledge of it. In this chapter, we provide a tutorial on using FaST-LMM-EWASher for DNA methylation data and discuss data analysis strategies.


DNA methylation Epigenome-wide association study Computational method Sample heterogeneity 


  1. 1.
    Jones P (2012) Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet 13:484–492CrossRefPubMedGoogle Scholar
  2. 2.
    Portela A, Esteller M (2010) Epigenetic modifications and human disease. Nat Biotechnol 28:1057–1068CrossRefPubMedGoogle Scholar
  3. 3.
    Kulis M, Esteller M (2010) DNA methylation and cancer. Adv Genet 70:27–56PubMedGoogle Scholar
  4. 4.
    Lechner M, Boshoff C, Beck S (2010) Cancer epigenome. Adv Genet 70:247–276PubMedGoogle Scholar
  5. 5.
    Rakyan VK, Down TA, Balding DJ, Beck S (2011) Epigenome-wide association studies for common human diseases. Nat Rev Genet 12:529–541CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791CrossRefPubMedGoogle Scholar
  7. 7.
    Listgarten J et al (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics 29:1526–1533CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Zhu J et al (2013) Genome-wide chromatin state transitions elicited by developmental and environmental cues. Cell 152:642–654CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Liu Y et al (2013) Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotechnol 31:142–147CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Houseman EA et al (2012) Open Access DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 13Google Scholar
  11. 11.
    Zou J et al (2014) Epigenome-wide association studies without the need for cell-type composition. Nat Methods 11:309–311CrossRefPubMedGoogle Scholar
  12. 12.
    Lippert C et al (2011) FaST linear mixed models for genome-wide association studies. Nat Methods 8:833–835CrossRefPubMedGoogle Scholar
  13. 13.
    Listgarten J et al (2012) Improved linear mixed models for genome-wide association studies. Nat Methods 9:525–526CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Lippert C, Quon G, Listgarten J, Heckerman D (2013) The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci Rep 3:1815PubMedPubMedCentralGoogle Scholar
  15. 15.
    Listgarten J, Lippert C, Heckerman D (2013) Fast-LMM-Select tackles confounding from spatial structure and rare variants. Nat Genet 45:470–471CrossRefPubMedGoogle Scholar
  16. 16.
    Price AL et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909CrossRefPubMedGoogle Scholar
  17. 17.
    Leek JT, Storey JD (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3:1724–1735CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.School of Engineering and Applied SciencesHarvard UniversityCambridgeUSA

Personalised recommendations