Advertisement

CpG Islands pp 31-47 | Cite as

Prediction of CpG Islands as an Intrinsic Clustering Property Found in Many Eukaryotic DNA Sequences and Its Relation to DNA Methylation

  • Cristina Gómez-Martín
  • Ricardo Lebrón
  • José L. Oliver
  • Michael Hackenberg
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1766)

Abstract

The promoter region of around 70% of all genes in the human genome is overlapped by a CpG island (CGI). CGIs have known functions in the transcription initiation and outstanding compositional features like high G+C content and CpG ratios when compared to the bulk DNA. We have shown before that CGIs manifest as clusters of CpGs in mammalian genomes and can therefore be detected using clustering methods. These techniques have several advantages over sliding window approaches which apply compositional properties as thresholds. In this protocol we show how to determine local (CpG islands) and global (distance distribution) clustering properties of CG dinucleotides and how to generalize this analysis to any k-mer or combinations of it. In addition, we illustrate how to easily cross the output of a CpG island prediction algorithm with our methylation database to detect differentially methylated CGIs. The analysis is given in a step-by-step protocol and all necessary programs are implemented into a virtual machine or, alternatively, the software can be downloaded and easily installed.

Key words

CpG islands Clustering DNA words DNA methylation Virtual machine 

References

  1. 1.
    Suzuki MM, Bird A (2008) DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet 9:465–476.  https://doi.org/10.1038/nrg2341 CrossRefPubMedGoogle Scholar
  2. 2.
    Duncan BK, Miller JH (1980) Mutagenic deamination of cytosine residues in DNA. Nature 287:560–561.  https://doi.org/10.1038/287560a0 CrossRefPubMedGoogle Scholar
  3. 3.
    Deaton AM, Bird A (2011) CpG islands and the regulation of transcription. Genes Dev 25:1010–1022.  https://doi.org/10.1101/gad.2037511 CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Hackenberg M, Previti C, Luque-escamilla PL et al (2006) CpGcluster: a distance-based algorithm for CpG-island detection. BMC Bioinformatics 13:1–13.  https://doi.org/10.1186/1471-2105-7-446 Google Scholar
  5. 5.
    Gardiner-Garden M, Frommer M (1987) CpG islands in vertebrate genomes. J Mol Biol 196:261–282CrossRefPubMedGoogle Scholar
  6. 6.
    Takai D, Jones PA (2002) Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci U S A 99:3740–3745.  https://doi.org/10.1073/pnas.052410099 CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Hackenberg M, Barturen G, Carpena P et al (2010) Prediction of CpG-island function: CpG clustering vs. sliding-window methods. BMC Genomics 11:327.  https://doi.org/10.1186/1471-2164-11-327 CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Hackenberg M, Carpena P, Bernaola-galván P et al (2011) WordCluster : detecting clusters of DNA words and genomic elements. Algorithms Mol Biol 6:2.  https://doi.org/10.1186/1748-7188-6-2 CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Pruitt KD, Tatusova T, Brown GR, Maglott DR (2012) NCBI reference sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40:D130–D135.  https://doi.org/10.1093/nar/gkr1079 CrossRefPubMedGoogle Scholar
  10. 10.
    Fernandez-Pozo N, Menda N, Edwards JD et al (2015) The Sol Genomics Network (SGN)—from genotype to phenotype to breeding. Nucleic Acids Res 43:D1036–D1041.  https://doi.org/10.1093/nar/gku1195 CrossRefPubMedGoogle Scholar
  11. 11.
    Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006.  https://doi.org/10.1101/gr.229102 CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Altschul SF, Erickson BW (1985) Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Mol Biol Evol 2:526–538PubMedGoogle Scholar
  13. 13.
    Lister R, Pelizzola M, Dowen RH et al (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462:315–322.  https://doi.org/10.1038/nature08514 CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Grantham R, Gautier C, Gouy M et al (1980) Codon catalog usage and the genome hypothesis. Nucleic Acids Res 8:197.  https://doi.org/10.1093/nar/8.1.197-c CrossRefGoogle Scholar
  15. 15.
    Bernardi G (1993) Genome organization and species formation in vertebrates. J Mol Evol 37(4):331–337CrossRefPubMedGoogle Scholar
  16. 16.
    Bernaola-Galván P, Oliver JL, Hackenberg M et al (2012) Segmentation of time series with long-range fractal correlations. Eur Phys J B.  https://doi.org/10.1140/epjb/e2012-20969-5
  17. 17.
    Hackenberg M, Rueda A, Carpena P et al (2012) Clustering of DNA words and biological function: a proof of principle. J Theor Biol 297:127–136.  https://doi.org/10.1016/j.jtbi.2011.12.024 CrossRefPubMedGoogle Scholar
  18. 18.
    Carpena P, Oliver JL, Hackenberg M et al (2011) High-level organization of isochores into gigantic superstructures in the human genome. Phys Rev E Stat Nonlin Soft Matter Phys 83:31908CrossRefGoogle Scholar
  19. 19.
    Dios F, Barturen G, Lebrón R et al (2014) DNA clustering and genome complexity. Comput Biol Chem 53:71–78.  https://doi.org/10.1016/j.compbiolchem.2014.08.011 CrossRefPubMedGoogle Scholar
  20. 20.
    Oliver L, Hackenberg M, Barturen G, De GD (2011) NGSmethDB: a database for next-generation sequencing single-cytosine-resolution DNA methylation data. Nucleic Acids Res 39:75–79.  https://doi.org/10.1093/nar/gkq942 Google Scholar
  21. 21.
    Hackenberg M, Barturen G, Oliver JL (2011) NGSmethDB: a database for next-generation sequencing single-cytosine-resolution DNA methylation data. Nucleic Acids Res 39:D75–D79.  https://doi.org/10.1093/nar/gkq942 CrossRefPubMedGoogle Scholar
  22. 22.
    Lebrón R, Gómez-Martín C, Carpena P et al (2016) NGSmethDB 2017: enhanced methylomes and differential methylation. Nucleic Acids Res 45:gkw996.  https://doi.org/10.1093/nar/gkw996 Google Scholar
  23. 23.
    Geisen S, Barturen G, Alganza M et al (2014) NGSmethDB: an updated genome resource for high quality , single-cytosine resolution methylomes. Nucleic Acids Res 42:53–59.  https://doi.org/10.1093/nar/gkt1202 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Cristina Gómez-Martín
    • 1
    • 2
  • Ricardo Lebrón
    • 1
    • 2
  • José L. Oliver
    • 1
    • 2
  • Michael Hackenberg
    • 1
    • 2
  1. 1.Department of Genetics, Faculty of ScienceUniversity of GranadaGranadaSpain
  2. 2.Lab. de Bioinformática, Centro de Investigación Biomédica, PTSInstituto de BiotecnologíaGranadaSpain

Personalised recommendations