Advertisement

Computational and Statistical Analysis of Array-Based DNA Methylation Data

  • Jessica NordlundEmail author
  • Christofer Bäcklin
  • Amanda Raine
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1878)

Abstract

The characterization of aberrant DNA methylation is emerging as a key part of the study of cancer development and phenotype. The technical advancements and decreasing costs of methods for high-throughput profiling of DNA methylation have brought about a high interest in the use of such methods in disease association studies. Here we discuss the principles for DNA methylation analysis using data from the Infinium DNA methylation BeadChip assays and describe the computational steps and statistical considerations going from processing of the raw array data to analysis of differential methylation. Moreover, we provide detailed guidelines on how to perform tumor subtype classification based on DNA methylation signatures.

Key words

DNA methylation Epigenetics Cancer Classification Subtyping BeadChip Assay 450k array 

References

  1. 1.
    Sandoval J, Esteller M (2012) Cancer epigenomics: beyond genomics. Curr Opin Genet Dev 22(1):50–55.  https://doi.org/10.1016/j.gde.2012.02.008CrossRefPubMedGoogle Scholar
  2. 2.
    Hansen KD, Timp W, Bravo HC, Sabunciyan S, Langmead B, McDonald OG, Wen B, Wu H, Liu Y, Diep D, Briem E, Zhang K, Irizarry RA, Feinberg AP (2011) Increased methylation variation in epigenetic domains across cancer types. Nat Genet 43(8):768–775.  https://doi.org/10.1038/ng.865CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Weisenberger DJ (2014) Characterizing DNA methylation alterations from the cancer genome atlas. J Clin Invest 124(1):17–23.  https://doi.org/10.1172/JCI69740CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Timp W, Bravo HC, McDonald OG, Goggins M, Umbricht C, Zeiger M, Feinberg AP, Irizarry RA (2014) Large hypomethylated blocks as a universal defining epigenetic alteration in human solid tumors. Genome Med 6(8):61.  https://doi.org/10.1186/s13073-014-0061-yCrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Nordlund J, Syvanen AC (2017) Epigenetics in pediatric acute lymphoblastic leukemia. Semin Cancer Biol.  https://doi.org/10.1016/j.semcancer.2017.09.001CrossRefGoogle Scholar
  6. 6.
    Witte T, Plass C, Gerhauser C (2014) Pan-cancer patterns of DNA methylation. Genome Med 6(8):66.  https://doi.org/10.1186/s13073-014-0066-6CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Nordlund J, Backlin CL, Zachariadis V, Cavelier L, Dahlberg J, Ofverholm I, Barbany G, Nordgren A, Overnas E, Abrahamsson J, Flaegstad T, Heyman MM, Jonsson OG, Kanerva J, Larsson R, Palle J, Schmiegelow K, Gustafsson MG, Lonnerholm G, Forestier E, Syvanen AC (2015) DNA methylation-based subtype prediction for pediatric acute lymphoblastic leukemia. Clin Epigenetics 7(1):11.  https://doi.org/10.1186/s13148-014-0039-zCrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Danielsson A, Nemes S, Tisell M, Lannering B, Nordborg C, Sabel M, Caren H (2015) MethPed: a DNA methylation classifier tool for the identification of pediatric brain tumor subtypes. Clin Epigenetics 7(1):62.  https://doi.org/10.1186/s13148-015-0103-3CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Teschendorff AE, Widschwendter M (2012) Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions. Bioinformatics 28(11):1487–1494.  https://doi.org/10.1093/bioinformatics/bts170CrossRefPubMedGoogle Scholar
  10. 10.
    Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou LX, Shen R, Gunderson KL (2009) Genome-wide DNA methylation profiling using Infinium (R) assay. Epigenomics 1(1):177–200.  https://doi.org/10.2217/EPI.09.14CrossRefPubMedGoogle Scholar
  11. 11.
    Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, Esteller M (2011) Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 6(6):692–702CrossRefGoogle Scholar
  12. 12.
    Pidsley R, Zotenko E, Peters TJ, Lawrence MG, Risbridger GP, Molloy P, Van Djik S, Muhlhausler B, Stirzaker C, Clark SJ (2016) Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol 17(1):208.  https://doi.org/10.1186/s13059-016-1066-1CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Walker DL, Bhagwate AV, Baheti S, Smalley RL, Hilker CA, Sun Z, Cunningham JM (2015) DNA methylation profiling: comparison of genome-wide sequencing methods and the Infinium Human Methylation 450 Bead Chip. Epigenomics 1–16. doi: https://doi.org/10.2217/EPI.15.64CrossRefGoogle Scholar
  14. 14.
    Marabita F, Tegnér J, Gomez-Cabrero D (2015) Introduction to data types in epigenomics. In: Teschendorff AE (ed) Computational and statistical Epigenomics, Translational bioinformatics, vol 7. Springer, Netherlands, pp 3–34.  https://doi.org/10.1007/978-94-017-9927-0_1CrossRefGoogle Scholar
  15. 15.
    Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL, Fan JB, Shen R (2011) High density DNA methylation array with single CpG site resolution. Genomics 98(4):288–295.  https://doi.org/10.1016/j.ygeno.2011.07.007CrossRefPubMedGoogle Scholar
  16. 16.
    Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462(7271):315–322.  https://doi.org/10.1038/nature08514CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Bock C (2012) Analysing and interpreting DNA methylation data. Nat Rev Genet 13(10):705–719.  https://doi.org/10.1038/nrg3273CrossRefPubMedGoogle Scholar
  18. 18.
    Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS (2005) A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet 37(5):549–554.  https://doi.org/10.1038/ng1547CrossRefPubMedGoogle Scholar
  19. 19.
    Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F (2011) Evaluation of the Infinium Methylation 450K technology. Epigenomics 3(6):771–784.  https://doi.org/10.2217/epi.11.105CrossRefPubMedGoogle Scholar
  20. 20.
    Sun Z, Cunningham J, Slager S, Kocher JP (2015) Base resolution methylome profiling: considerations in platform selection, data preprocessing and analysis. Epigenomics.  https://doi.org/10.2217/epi.15.21CrossRefGoogle Scholar
  21. 21.
    Maksimovic J, Gordon L, Oshlack A (2012) SWAN: subset-quantile within array normalization for Illumina Infinium HumanMethylation450 BeadChips. Genome Biol 13(6):R44.  https://doi.org/10.1186/Gb-2012-13-6-R44CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, Beck S (2013) A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 29(2):189–196.  https://doi.org/10.1093/bioinformatics/bts680CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Marabita F, Almgren M, Lindholm ME, Ruhrmann S, Fagerstrom-Billai F, Jagodic M, Sundberg CJ, Ekstrom TJ, Teschendorff AE, Tegner J, Gomez-Cabrero D (2013) An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform. Epigenetics 8(3):333–346.  https://doi.org/10.4161/epi.24008CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Wu MC, Joubert BR, Kuan PF, Haberg SE, Nystad W, Peddada SD, London SJ (2014) A systematic assessment of normalization approaches for the Infinium 450K methylation platform. Epigenetics 9(2):318–329.  https://doi.org/10.4161/epi.27119CrossRefPubMedGoogle Scholar
  25. 25.
    Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, Irizarry RA (2014) Minfi: a flexible and comprehensive bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30(10):1363–1369.  https://doi.org/10.1093/bioinformatics/btu049CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Morris TJ, Butcher LM, Feber A, Teschendorff AE, Chakravarthy AR, Wojdacz TK, Beck S (2014) ChAMP: 450k Chip analysis methylation pipeline. Bioinformatics 30(3):428–430.  https://doi.org/10.1093/bioinformatics/btt684CrossRefPubMedGoogle Scholar
  27. 27.
    Assenov Y, Muller F, Lutsik P, Walter J, Lengauer T, Bock C (2014) Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods 11(11):1138–1140.  https://doi.org/10.1038/nmeth.3115CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Wilhelm-Benartzi CS, Koestler DC, Karagas MR, Flanagan JM, Christensen BC, Kelsey KT, Marsit CJ, Houseman EA, Brown R (2013) Review of processing and analysis methods for DNA methylation array data. Br J Cancer 109(6):1394–1402.  https://doi.org/10.1038/bjc.2013.496CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Nordlund J, Backlin CL, Wahlberg P, Busche S, Berglund EC, Eloranta ML, Flaegstad T, Forestier E, Frost BM, Harila-Saari A, Heyman M, Jonsson OG, Larsson R, Palle J, Ronnblom L, Schmiegelow K, Sinnett D, Soderhall S, Pastinen T, Gustafsson MG, Lonnerholm G, Syvanen AC (2013) Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemia. Genome Biol 14(9):r105.  https://doi.org/10.1186/gb-2013-14-9-r105CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Naeem H, Wong NC, Chatterton Z, Hong MK, Pedersen JS, Corcoran NM, Hovens CM, Macintyre G (2014) Reducing the risk of false discovery enabling identification of biologically significant genome-wide methylation status using the HumanMethylation450 array. BMC Genomics 15:51.  https://doi.org/10.1186/1471-2164-15-51CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Chen YA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R (2013) Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 8(2):203–209.  https://doi.org/10.4161/epi.23470CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760.  https://doi.org/10.1093/bioinformatics/btp324CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, Lin SM (2010) Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11:587.  https://doi.org/10.1186/1471-2105-11-587CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M, Lander ES, Mikkelsen TS, Thomson JA (2010) The NIH roadmap epigenomics mapping consortium. Nat Biotechnol 28(10):1045–1048.  https://doi.org/10.1038/nbt1010-1045CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Consortium EP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74.  https://doi.org/10.1038/nature11247CrossRefGoogle Scholar
  36. 36.
    Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigo R, Hubbard TJ (2012) GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 22(9):1760–1774.  https://doi.org/10.1101/gr.135350.111CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Kuhn RM, Haussler D, Kent WJ (2013) The UCSC genome browser and associated tools. Brief Bioinform 14(2):144–161.  https://doi.org/10.1093/bib/bbs038CrossRefPubMedGoogle Scholar
  38. 38.
    Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T, Ntini E, Arner E, Valen E, Li K, Schwarzfischer L, Glatz D, Raithel J, Lilje B, Rapin N, Bagger FO, Jorgensen M, Andersen PR, Bertin N, Rackham O, Burroughs AM, Baillie JK, Ishizu Y, Shimizu Y, Furuhata E, Maeda S, Negishi Y, Mungall CJ, Meehan TF, Lassmann T, Itoh M, Kawaji H, Kondo N, Kawai J, Lennartsson A, Daub CO, Heutink P, Hume DA, Jensen TH, Suzuki H, Hayashizaki Y, Muller F, Consortium F, Forrest AR, Carninci P, Rehli M, Sandelin A (2014) An atlas of active enhancers across human cell types and tissues. Nature 507(7493):455–461.  https://doi.org/10.1038/nature12787CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43):15545–15550.  https://doi.org/10.1073/pnas.0506580102CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Balakrishnan R, Harris MA, Huntley R, Van Auken K, Cherry JM (2013) A guide to best practices for gene ontology (GO) manual annotation. Database 2013:bat054.  https://doi.org/10.1093/database/bat054CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297.  https://doi.org/10.1007/BF00994018CrossRefGoogle Scholar
  42. 42.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32.  https://doi.org/10.1023/A:1010933404324CrossRefGoogle Scholar
  43. 43.
    Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99(10):6567–6572.  https://doi.org/10.1073/pnas.082099299CrossRefPubMedPubMedCentralGoogle Scholar
  44. 44.
    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67(2):301–320.  https://doi.org/10.1111/j.1467-9868.2005.00503.xCrossRefGoogle Scholar
  45. 45.
    Milani L, Lundmark A, Kiialainen A, Nordlund J, Flaegstad T, Forestier E, Heyman M, Jonmundsson G, Kanerva J, Schmiegelow K, Soderhall S, Gustafsson MG, Lonnerholm G, Syvanen AC (2010) DNA methylation for subtype classification and prediction of treatment outcome in patients with childhood acute lymphoblastic leukemia. Blood 115(6):1214–1225.  https://doi.org/10.1182/blood-2009-04-214668CrossRefPubMedGoogle Scholar
  46. 46.
    Stefansson OA, Moran S, Gomez A, Sayols S, Arribas-Jorba C, Sandoval J, Hilmarsdottir H, Olafsdottir E, Tryggvadottir L, Jonasson JG, Eyfjord J, Esteller M (2015) A DNA methylation-based definition of biologically distinct breast cancer subtypes. Mol Oncol 9(3):555–568.  https://doi.org/10.1016/j.molonc.2014.10.012CrossRefPubMedGoogle Scholar
  47. 47.
    Backlin CL, Gustafsson MG (2018) Developer friendly and computationally efficient predictive modeling without information leakage: the emil package for R. J Stat Softw, 85(13).  https://doi.org/10.18637/jss.v085.i13, https://www.jstatsoft.org/v085/i13

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Jessica Nordlund
    • 2
    Email author
  • Christofer Bäcklin
    • 1
  • Amanda Raine
    • 2
  1. 1.Department of Medical SciencesUppsala UniversityUppsalaSweden
  2. 2.Department of Medical Sciences and Science for Life LaboratoryUppsala UniversityUppsalaSweden

Personalised recommendations