Journal of Biosciences

, 44:143 | Cite as

A review of computational algorithms for CpG islands detection

  • Rana Adnan Tahir
  • Da Zheng
  • Amina Nazir
  • Hong QingEmail author


CpG islands are generally known as the epigenetic regulatory regions in accordance with histone modifications, methylation, and promoter activity. There is a significant need for the exact mapping of DNA methylation in CpG islands to understand the diverse biological functions. However, the precise identification of CpG islands from the whole genome through experimental and computational approaches is still challenging. Numerous computational methods are being developed to detect the CpG-enriched regions, effectively, to reduce the time and cost of the experiments. Here, we review some of the latest computational CpG detection methods that utilize clustering, patterns and physical-distance like parameters for CpG island detection. The comparative analyses of the methods relying on different principles and parameters allow prioritizing the algorithms for specific CpG associated datasets to achieve higher accuracy and sensitivity. A number of computational tools based on the window, Hidden Markov Model, density and distance-/length-based algorithms are being applied on human or mammalian genomes for accurate CpG detection. Comparative analyses of CpG island detection algorithms facilitate to prefer the method according to the target genome and required parameters to attain higher accuracy, specificity, and performance. There is still a need for efficient computational CpG detection methods with lower false-positive results. This review provides a better understanding about the principles of tools that will assist to prioritize and develop the algorithms for accurate CpG islands detection.


Bioinformatics computational algorithms CpG island CpGcluster epigenetics methylation 



We are thankful to Sheikh Arslan Sehgal, University of Chinese Academy of Sciences, Talal Jamil Qazi and Lucienne N. Duru, Beijing Institute of Technology, Beijing, for their kind support and suggestions throughout the manuscript.


  1. Agresti A 1992 A survey of exact inference for contingency tables. Stat. Sci. 7 131–153CrossRefGoogle Scholar
  2. Bert SA, Robinson MD, Strbenac D, Statham AL, Song JZ, Hulf T, Sutherland RL, Coolen MW, et al. 2013 Regional activation of the cancer genome by long-range epigenetic remodeling. Cancer Cell 23 9–22CrossRefPubMedGoogle Scholar
  3. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, et al. 2011 High density DNA methylation array with single CpG site resolution. Genomics 98 288–295CrossRefPubMedGoogle Scholar
  4. Bird A 2002 DNA methylation patterns and epigenetic memory. Genes Dev. 16 6–21CrossRefPubMedGoogle Scholar
  5. Boukelia A, Benmounah Z, Batouche M, Maati B and Nekkache I 2016 A Novel Algorithm for CpG Island Detection in Human Genome Based on Clustering and Chaotic Particle Swarm Optimization; in International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics Springer pp 70–81Google Scholar
  6. Britten RJ 1996 DNA sequence insertion and evolutionary variation in gene regulation. Proc. Natl. Acad. Sci. 93 9374–9377CrossRefPubMedGoogle Scholar
  7. Brunner AL, Johnson DS, Kim SW, Valouev A, Reddy TE, Neff NF, Anton E, Medina C, et al. 2009 Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. Genome Res. 19 1044–1056CrossRefPubMedPubMedCentralGoogle Scholar
  8. Chuang L-Y, Huang H-C, Lin M-C and Yang C-H 2011 Particle swarm optimization with reinforcement learning for the prediction of CpG islands in the human genome. PLoS One 6 e21036CrossRefPubMedPubMedCentralGoogle Scholar
  9. Chuang L-Y, Yang C-H, Lin M-C, Yang C-H 2012 CpGPAP: CpG island predictor analysis platform. BMC Genet. 13 13CrossRefPubMedPubMedCentralGoogle Scholar
  10. Churchill GA 1989 Stochastic models for heterogeneous DNA sequences. B. Math. Biol. 51 79–94CrossRefGoogle Scholar
  11. Consortium EP 2012 An integrated encyclopedia of DNA elements in the human genome. Nature 489 57CrossRefGoogle Scholar
  12. de la Rica L, Urquiza JM, Gómez-Cabrero D, Islam AB, López-Bigas N, Tegnér J, Toes RE and Ballestar E 2013 Identification of novel markers in rheumatoid arthritis through integrated analysis of DNA methylation and microRNA expression. J. Autoimmun. 41 6–16Google Scholar
  13. Deininger PL and Batzer MA 1999 Alu repeats and human disease. Mol. Genet. Metab. 67 183–193CrossRefPubMedGoogle Scholar
  14. Du Q, Luu P-L, Stirzaker C and Clark SJ 2015 Methyl-CpG-binding domain proteins: Readers of the epigenome. Epigenomics 7 1051–1073CrossRefPubMedGoogle Scholar
  15. Durbin R, Eddy SR, Krogh A and Mitchison G 1998 Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  16. Edgar R, Tan PPC, Portales-Casamar E and Pavlidis P 2014 Meta-analysis of human methylomes reveals stably methylated sequences surrounding CpG islands associated with high gene expression. Epigenet. Chromatin 7 28CrossRefGoogle Scholar
  17. Elango N and Soojin VY 2011 Functional relevance of CpG island length for regulation of gene expression. Genetics 187 1077–1083CrossRefPubMedPubMedCentralGoogle Scholar
  18. Feinberg AP and Tycko B 2004 The history of cancer epigenetics. Nat. Rev. Cancer 4 143CrossRefPubMedGoogle Scholar
  19. Feng P-M, Ding H, Chen W and Lin H 2013a Naive Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med. 2013Google Scholar
  20. Feng P-M, Lin H and Chen W 2013b Identification of antioxidants from sequence information using Naive Bayes. Comput. Math. Methods Med. 2013Google Scholar
  21. Feng P, Chen W and Lin H 2014 Prediction of CpG island methylation status by integrating DNA physicochemical properties. Genomics 104 229–233CrossRefPubMedGoogle Scholar
  22. Filion GJ, Zhenilo S, Salozhin S, Yamada D, Prokhortchouk E and Defossez P-A 2006 A family of human zinc finger proteins that bind methylated DNA and repress transcription. Mol. Cell. Biol. 26 169–181CrossRefPubMedPubMedCentralGoogle Scholar
  23. Gardiner-Garden M and Frommer M 1987 CpG islands in vertebrate genomes. J. Mol. Biol. 196 261–282CrossRefPubMedGoogle Scholar
  24. Glass JL, Thompson RF, Khulan B, Figueroa ME, Olivier EN, Oakley EJ, Van Zant G, Bouhassira EE, et al. 2007 CG dinucleotide clustering is a species-specific property of the genome. Nucleic Acids Res. 35 6798–6807CrossRefPubMedPubMedCentralGoogle Scholar
  25. Greally JM 2013 DNA Methylation: Bidding the CpG island goodbye. Elife 2 e00593CrossRefPubMedPubMedCentralGoogle Scholar
  26. Hackenberg M, Barturen G, Carpena P, Luque-Escamilla PL, Previti C and Oliver JL 2010 Prediction of CpG-island function: CpG clustering vs. sliding-window methods. BMC Genomics 11 327Google Scholar
  27. Hackenberg M, Carpena P, Bernaola-Galván P, Barturen G, Alganza ÁM and Oliver JL 2011 WordCluster: Detecting clusters of DNA words and genomic elements. Algorithms Mol. Biol. 6 2CrossRefGoogle Scholar
  28. Hackenberg M, Previti C, Luque-Escamilla PL, Carpena P, Martínez-Aroza J and Oliver JL 2006 CpGcluster: A distance-based algorithm for CpG-island detection. BMC Bioinform. 7 446CrossRefPubMedPubMedCentralGoogle Scholar
  29. Hashimoto H, Vertino PM and Cheng X 2010 Molecular coupling of DNA methylation and histone methylation. Epigenomics 2 657–669CrossRefPubMedPubMedCentralGoogle Scholar
  30. Hon GC, Hawkins RD, Caballero OL, Lo C, Lister R, Pelizzola M, Valsesia A, Ye Z, et al. 2012 Global DNA hypomethylation coupled to repressive chromatin domain formation and gene silencing in breast cancer. Genome Res. 22 246–258CrossRefPubMedPubMedCentralGoogle Scholar
  31. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H, Gabo K, et al. 2009 The human colon cancer methylome shows similar hypo-and hypermethylation at conserved tissue-specific CpG island shores. Nat. Genet. 41 178CrossRefPubMedPubMedCentralGoogle Scholar
  32. Jang HS, Shin WJ, Lee JE and Do JT 2017 CpG and non-CpG methylation in epigenetic gene regulation and brain function. Genes 8 148CrossRefPubMedCentralGoogle Scholar
  33. Jeong M, Sun D, Luo M, Huang Y, Challen GA, Rodriguez B, Zhang X, Chavez L, et al. 2014 Large conserved domains of low DNA methylation maintained by Dnmt3a. Nat. Genet. 46 17CrossRefPubMedGoogle Scholar
  34. Jeziorska DM, Murray RJ, De Gobbi M, Gaentzsch R, Garrick D, Ayyub H, Chen T, Li E, et al. 2017 DNA methylation of intragenic CpG islands depends on their transcriptional activity during differentiation and disease. Proc. Natl. Acad. Sci. 114 E7526–E7535CrossRefPubMedGoogle Scholar
  35. Jones PA 2012 Functions of DNA methylation: Islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13 484CrossRefPubMedGoogle Scholar
  36. Jones PA and Baylin SB 2007 The epigenomics of cancer. Cell 128 683–692CrossRefPubMedPubMedCentralGoogle Scholar
  37. Kakumani R, Ahmad O and Devabhaktuni V 2012 Identification of CpG islands in DNA sequences using statistically optimal null filters. EURASIP J. Bioinform. Syst. Biol. 2012 12CrossRefPubMedPubMedCentralGoogle Scholar
  38. Ligtenberg MJ, Kuiper RP, Chan TL, Goossens M, Hebeda KM, Voorendt M, Lee TY, Bodmer D, et al. 2009 Heritable somatic methylation and inactivation of MSH2 in families with Lynch syndrome due to deletion of the 3′ exons of TACSTD1. Nat. Genet. 41 112CrossRefPubMedGoogle Scholar
  39. Mayer W, Niveleau A, Walter J, Fundele R and Haaf T 2000 Embryogenesis: Demethylation of the zygotic paternal genome. Nature 403 501CrossRefPubMedGoogle Scholar
  40. McClelland M and Ivarie R 1982 Asymmetrical distribution of CpG in an ‘average’mammalian gene. Nucleic Acids Res. 10 7865–7877CrossRefPubMedPubMedCentralGoogle Scholar
  41. Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, et al. 2008 Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454 766CrossRefPubMedPubMedCentralGoogle Scholar
  42. Okano M, Xie S and Li E 1998 Cloning and characterization of a family of novel mammalian DNA (cytosine-5) methyltransferases. Nat. Genet. 19 219CrossRefPubMedGoogle Scholar
  43. Olson SA 2002 Emboss opens up sequence analysis. Brief. Bioinform. 3 87–91CrossRefPubMedGoogle Scholar
  44. Pastor WA, Aravind L and Rao A 2013 TETonic shift: Biological roles of TET proteins in DNA demethylation and transcription. Nat. Rev. Mol. Cell Bio. 14 341CrossRefGoogle Scholar
  45. Ponger L, Mouchiroud D 2002 CpGProD: Identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 18 631–633CrossRefPubMedGoogle Scholar
  46. Rice P, Longden I and Bleasby A 2000 EMBOSS: The European molecular biology open software suite. Elsevier Current TrendsGoogle Scholar
  47. Robertson KD 2005 DNA methylation and human disease. Nat. Rev. Genet. 6 597CrossRefPubMedGoogle Scholar
  48. Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M and Esteller M 2011 Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 6 692–702CrossRefPubMedPubMedCentralGoogle Scholar
  49. Saxonov S, Berg P and Brutlag DL 2006 A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc. Natl. Acad. Sci. 103 1412–1417CrossRefPubMedGoogle Scholar
  50. Smith ZD and Meissner A 2013 DNA methylation: Roles in mammalian development. Nat. Rev. Genet. 14 204CrossRefPubMedGoogle Scholar
  51. Stirzaker C, Taberlay PC, Statham AL and Clark SJ 2014 Mining cancer methylomes: prospects and challenges. Trends Genet. 30 75–84CrossRefPubMedGoogle Scholar
  52. Su J, Zhang Y, Lv J, Liu H, Tang X, Wang F, Qi Y, Feng Y, et al. 2009 CpG_MI: A novel approach for identifying functional CpG islands in mammalian genomes. Nucleic Acids Res. 38 e6-e6CrossRefPubMedPubMedCentralGoogle Scholar
  53. Sujuan Y, Asaithambi A and Liu Y 2008 CpGIF: An algorithm for the identification of CpG islands. Bioinformation 2 335CrossRefPubMedPubMedCentralGoogle Scholar
  54. Takai D and Jones PA 2002 Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc. Natl. Acad. Sci. 99 3740–3745CrossRefPubMedGoogle Scholar
  55. Timp W and Feinberg AP 2013 Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host. Nat. Rev. Cancer 13 497CrossRefPubMedPubMedCentralGoogle Scholar
  56. Tufarelli C, Stanley JAS, Garrick D, Sharpe JA, Ayyub H, Wood WG and Higgs DR 2003 Transcription of antisense RNA leading to gene silencing and methylation as a novel cause of human genetic disease. Nat. Genet. 34 157CrossRefPubMedGoogle Scholar
  57. Turner N 2000 Chi-squared test. J. Clin. Nurs. 9 10Google Scholar
  58. Wahlberg P, Lundmark A, Nordlund J, Busche S, Raine A, Tandre K, Rönnblom L, Sinnett D, et al. 2016 DNA methylome analysis of acute lymphoblastic leukemia cells reveals stochastic de novo DNA methylation in CpG islands. Epigenomics 8 1367–1387CrossRefPubMedGoogle Scholar
  59. Wang J, Tsang WW and Marsaglia G 2003 Evaluating Kolmogorov’s distribution. J. Stat. Softw. 8 1–4Google Scholar
  60. Wu H, Caffo B, Jaffee HA, Irizarry RA and Feinberg AP 2010 Redefining CpG islands using hidden Markov models. Biostatistics 11 499–514CrossRefPubMedPubMedCentralGoogle Scholar
  61. Xie W, Schultz MD, Lister R, Hou Z, Rajagopal N, Ray P, Whitaker JW, Tian S, et al. 2013 Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153 1134–1148CrossRefPubMedPubMedCentralGoogle Scholar
  62. Yang C-H, Lin Y-D, Chiang Y-C and Chuang L-Y 2016 A hybrid approach for CpG island detection in the human genome. PloS ONE 11 e0144748CrossRefPubMedPubMedCentralGoogle Scholar
  63. Yoon B-J and Vaidyanathan P 2004 Identification of CpG islands using a bank of IIR lowpass filters [DNA sequence detection]; in Digital Signal Processing Workshop, 2004, and the 3rd IEEE Signal Processing Education Workshop IEEE pp 315–319Google Scholar
  64. Yousef M, Jung S, Kossenkov AV, Showe LC and Showe MK 2007 Naïve Bayes for microRNA target predictions – machine learning for microRNA targets. Bioinformatics 23 2987–2992CrossRefPubMedGoogle Scholar
  65. Yu N, Guo X, Zelikovsky A and Pan Y 2017 GaussianCpG: A Gaussian model for detection of CpG island in human genome sequences. BMC Genomics 18 392CrossRefPubMedPubMedCentralGoogle Scholar
  66. Zheng H, Wu H, Li J and Jiang S-W 2013 CpGIMethPred: Computational model for predicting methylation status of CpG islands in human genome. BMC Med. Genomics 6 S13CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Indian Academy of Sciences 2019

Authors and Affiliations

  1. 1.Key Laboratory of Molecular Medicine and Biotherapy in the Ministry of Industry and Information Technology, Department of Biology, School of Life SciencesBeijing Institute of TechnologyBeijingChina
  2. 2.Department of BiosciencesCOMSATS University IslamabadIslamabadPakistan

Personalised recommendations