Web-Based Analysis of (Epi-) Genome Data Using EpiGRAPH and Galaxy

  • Christoph Bock
  • Greg Von Kuster
  • Konstantin Halachev
  • James Taylor
  • Anton Nekrutenko
  • Thomas Lengauer
Part of the Methods in Molecular Biology book series (MIMB, volume 628)


Modern life sciences are becoming increasingly data intensive, posing a significant challenge for most researchers and shifting the bottleneck of scientific discovery from data generation to data analysis. As a result, progress in genome research is increasingly impeded by bioinformatic hurdles. A new generation of powerful and easy-to-use genome analysis tools has been developed to address this issue, enabling biologists to perform complex bioinformatic analyses online - without having to learn a programming language or downloading and manually processing large datasets. In this tutorial paper, we describe the use of EpiGRAPH ( and Galaxy ( for genome and epigenome analysis, and we illustrate how these two web services work together to identify epigenetic modifications that are characteristics of highly polymorphic (SNP-rich) promoters. This paper is supplemented with video tutorials (, which provide a step-by-step guide through each example analysis.

Key words

Bioinformatics Genome analysis Statistics Machine learning Computational epigenetics Single nucleotide polymorphisms (SNPs) Evolutionary constraint 



We would like to thank Joachim Büch for maintaining the IT infrastructure of EpiGRAPH, Yoichi Yamada and Sascha Tierling for providing DNA methylation data, and Martina Paulsen as well as Jörn Walter for helpful discussions. EpiGRAPH is partially funded by the European Union through the CANCERDIP project (HEALTH-F2–2007-200620; Galaxy is supported by NSF Grant DBI-0543285 and NIH Grant 5R01HG003646–02 as well as by funds from the Huck Institutes for Life Sciences at Penn State University and Pennsylvania Department of Health.


  1. 1.
    Bernstein, B.E., Meissner, A. and Lander, E.S. (2007) The mammalian epigenome. Cell, 128, 669–681.PubMedCrossRefGoogle Scholar
  2. 2.
    Chen, K. and Rajewsky, N. (2007) The evolution of gene regulation by transcription factors and microRNAs. Nat. Rev. Genet., 8, 93–103.PubMedCrossRefGoogle Scholar
  3. 3.
    Zhang, M.Q. (2005) In: Pal, S. K. (ed.), PReMI. Springer-Verlag Berlin Heidelberg, Vol. 3776, pp. 31–38.Google Scholar
  4. 4.
    Frigola, J., Song, J., Stirzaker, C., Hinshelwood, R.A., Peinado, M.A. and Clark, S.J. (2006) Epigenetic remodeling in colorectal cancer results in coordinate gene suppression across an entire chromosome band. Nat. Genet., 38, 540–549.PubMedCrossRefGoogle Scholar
  5. 5.
    Feinberg, A.P. (2007) Phenotypic plasticity and the epigenetics of human disease. Nature, 447, 433–440.PubMedCrossRefGoogle Scholar
  6. 6.
    Eckhardt, F., Lewin, J., Cortese, R., Rakyan, V.K., Attwood, J., Burger, M., et al.(2006) DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet., 38, 1378–1385.PubMedCrossRefGoogle Scholar
  7. 7.
    Williams, R.B., Chan, E.K., Cowley, M.J. and Little, P.F. (2007) The influence of genetic variation on gene expression. Genome Res., 17, 1707–1716.PubMedCrossRefGoogle Scholar
  8. 8.
    Bock, C., Walter, J., Paulsen, M. and Lengauer, T. (2008) Inter-individual variation of DNA methylation and its implications for large-scale epigenome mapping. Nucleic Acids Res., 36, e55.PubMedCrossRefGoogle Scholar
  9. 9.
    Schones, D.E. and Zhao, K. (2008) Genome-wide approaches to studying chromatin modifications. Nat. Rev. Genet., 9, 179–191.PubMedCrossRefGoogle Scholar
  10. 10.
    Bock, C., Halachev, K., Buch, J. and Lengauer, T. (2009) EpiGRAPH: User-friendly software for statistical analysis and prediction of (epi-) genomic data. Genome Biol., 10, R14.PubMedCrossRefGoogle Scholar
  11. 11.
    Bock, C., Paulsen, M., Tierling, S., Mikeska, T., Lengauer, T. and Walter, J. (2006) CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure. PLoS Genet., 2, e26.PubMedCrossRefGoogle Scholar
  12. 12.
    Liu, F., Tostesen, E., Sundet, J.K., Jenssen, T.K., Bock, C., Jerstad, G.I., et al.(2007) The human genomic melting map. PLoS Comput. Biol., 3, e93.PubMedCrossRefGoogle Scholar
  13. 13.
    Bock, C., Walter, J., Paulsen, M. and Lengauer, T. (2007) CpG island mapping by epigenome prediction. PLoS Comput. Biol., 3, e110.PubMedCrossRefGoogle Scholar
  14. 14.
    Moser, D., Ekawardhani, S., Kumsta, R., Palmason, H., Bock, C., Athanassiadou, Z., et al.(2008) Functional analysis of a potassium-chloride co-transporter 3 (SLC12A6) promoter polymorphism leading to an additional DNA methylation site. Neuropsychopharmacology, 34, 458–467.PubMedCrossRefGoogle Scholar
  15. 15.
    Blankenberg, D., Taylor, J., Schenck, I., He, J., Zhang, Y., Ghent, M., et al.(2007) A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res., 17, 960–964.PubMedCrossRefGoogle Scholar
  16. 16.
    Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., et al.(2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res., 15, 1451–1455.PubMedCrossRefGoogle Scholar
  17. 17.
    Pond, S.L., Frost, S.D. and Muse, S.V. (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics, 21, 676–679.PubMedCrossRefGoogle Scholar
  18. 18.
    Rice, P., Longden, I. and Bleasby, A. (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet., 16, 276–277.PubMedCrossRefGoogle Scholar
  19. 19.
    van Steensel, B. (2005) Mapping of genetic and epigenetic regulatory networks using microarrays. Nat. Genet., 37 Suppl, S18–24.PubMedCrossRefGoogle Scholar
  20. 20.
    Bock, C. and Lengauer, T. (2008) Computational epigenetics. Bioinformatics, 24, 1–10.PubMedCrossRefGoogle Scholar
  21. 21.
    Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., et al.(2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol., 5, R80.PubMedCrossRefGoogle Scholar
  22. 22.
    Liu, X.S. (2007) Getting started in tiling microarray analysis. PLoS Comput. Biol., 3, 1842–1844.PubMedGoogle Scholar
  23. 23.
    Johnson, D.S., Li, W., Gordon, D.B., Bhattacharjee, A., Curry, B., Ghosh, J., et al.(2008) Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res., 18, 393–403.PubMedCrossRefGoogle Scholar
  24. 24.
    Johnson, W.E., Li, W., Meyer, C.A., Gottardo, R., Carroll, J.S., Brown, M. and Liu, X.S. (2006) Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl. Acad. Sci. USA., 103, 12457–12462.PubMedCrossRefGoogle Scholar
  25. 25.
    Kumaki, Y., M. Oda, and M. Okano. 2008. QUMA: quantification tool for methylation analysis. Nucleic Acids Res36: W170–175.PubMedCrossRefGoogle Scholar
  26. 26.
    Bock, C., Reither, S., Mikeska, T., Paulsen, M., Walter, J. and Lengauer, T. (2005) BiQ Analyzer: visualization and quality control for DNA methylation data from bisulfite sequencing. Bioinformatics, 21, 4067–4068.PubMedCrossRefGoogle Scholar
  27. 27.
    Karolchik, D., Kuhn, R.M., Baertsch, R., Barber, G.P., Clawson, H., Diekhans, M., et al.(2008) The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res., 36, D773–779.PubMedCrossRefGoogle Scholar
  28. 28.
    Flicek, P., Aken, B.L., Beal, K., Ballester, B., Caccamo, M., Chen, Y., et al.(2008) Ensembl 2008. Nucleic Acids Res., 36, D707–714.PubMedCrossRefGoogle Scholar
  29. 29.
    Das, R., Dimitrova, N., Xuan, Z., Rollins, R.A., Haghighi, F., Edwards, J.R., et al.(2006) Computational prediction of methylation status in human genomic sequences. Proc. Natl. Acad. Sci. U. S. A., 103, 10713–10716.PubMedCrossRefGoogle Scholar
  30. 30.
    Fang, F., Fan, S., Zhang, X. and Zhang, M.Q. (2006) Predicting methylation status of CpG islands in the human brain. Bioinformatics, 22, 2204–2209.PubMedCrossRefGoogle Scholar
  31. 31.
    Yamada, Y., Watanabe, H., Miura, F., Soejima, H., Uchiyama, M., Iwasaka, T., et al.(2004) A comprehensive analysis of allelic methylation status of CpG islands on human chromosome 21q. Genome Res., 14, 247–266.PubMedCrossRefGoogle Scholar
  32. 32.
    Noble, W.S. (2006) What is a support vector machine? Nat. Biotechnol., 24, 1565–1567.PubMedCrossRefGoogle Scholar
  33. 33.
    Zhang, Y., C. Rohde, S. Tierling, T.P. Jurkowski, C. Bock, D. Santacruz, S. Ragozin, R. Reinhardt, M. Groth, J. Walter, and A. Jeltsch. 2009. DNA methylation analysis of chromosome 21 gene promoters at single base pair and single allele resolution. PLoS Genet 5: e1000438.Google Scholar
  34. 34.
    Frazer, K.A., Ballinger, D.G., Cox, D.R., Hinds, D.A., Stuve, L.L., Gibbs, R.A., et al.(2007) A second generation human haplotype map of over 3.1 million SNPs. Nature, 449, 851–861.PubMedCrossRefGoogle Scholar
  35. 35.
    ENCODE Project Consortium. (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science, 306, 636–640.CrossRefGoogle Scholar
  36. 36.
    Wang, G.P., Ciuffi, A., Leipzig, J., Berry, C.C. and Bushman, F.D. (2007) HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications. Genome Res., 17, 1186–1194.PubMedCrossRefGoogle Scholar
  37. 37.
    Witten, I.H. and Frank, E. (2000) Data mining : practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco, Calif.Google Scholar
  38. 38.
    Hastie, T., Tibshirani, R. and Friedman, J.H. (2001) The elements of statistical learning : data mining, inference, and prediction. Springer, New York.Google Scholar
  39. 39.
    Tarca, A.L., Carey, V.J., Chen, X.W., Romero, R. and Draghici, S. (2007) Machine learning and its applications to biology. PLoS Comput. Biol., 3, e116.PubMedCrossRefGoogle Scholar
  40. 40.
    Meissner, A., Mikkelsen, T.S., Gu, H., Wernig, M., Hanna, J., Sivachenko, A., et al.(2008) Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature, 454, 766–770.PubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Christoph Bock
    • 1
  • Greg Von Kuster
    • 1
  • Konstantin Halachev
    • 2
  • James Taylor
    • 3
  • Anton Nekrutenko
    • 3
  • Thomas Lengauer
    • 1
  1. 1.Max-Planck-Institut für InformatikSaarbrückenGermany
  2. 2.Center for Comparative Genomics and Bioinformatics, Huck Institutes for Life SciencesPenn State UniversityUniversity ParkUSA
  3. 3.Departments of Biology and Mathematics & Computer ScienceEmory UniversityAtlantaUSA

Personalised recommendations