A review of computational algorithms for CpG islands detection
- 15 Downloads
CpG islands are generally known as the epigenetic regulatory regions in accordance with histone modifications, methylation, and promoter activity. There is a significant need for the exact mapping of DNA methylation in CpG islands to understand the diverse biological functions. However, the precise identification of CpG islands from the whole genome through experimental and computational approaches is still challenging. Numerous computational methods are being developed to detect the CpG-enriched regions, effectively, to reduce the time and cost of the experiments. Here, we review some of the latest computational CpG detection methods that utilize clustering, patterns and physical-distance like parameters for CpG island detection. The comparative analyses of the methods relying on different principles and parameters allow prioritizing the algorithms for specific CpG associated datasets to achieve higher accuracy and sensitivity. A number of computational tools based on the window, Hidden Markov Model, density and distance-/length-based algorithms are being applied on human or mammalian genomes for accurate CpG detection. Comparative analyses of CpG island detection algorithms facilitate to prefer the method according to the target genome and required parameters to attain higher accuracy, specificity, and performance. There is still a need for efficient computational CpG detection methods with lower false-positive results. This review provides a better understanding about the principles of tools that will assist to prioritize and develop the algorithms for accurate CpG islands detection.
KeywordsBioinformatics computational algorithms CpG island CpGcluster epigenetics methylation
We are thankful to Sheikh Arslan Sehgal, University of Chinese Academy of Sciences, Talal Jamil Qazi and Lucienne N. Duru, Beijing Institute of Technology, Beijing, for their kind support and suggestions throughout the manuscript.
- Boukelia A, Benmounah Z, Batouche M, Maati B and Nekkache I 2016 A Novel Algorithm for CpG Island Detection in Human Genome Based on Clustering and Chaotic Particle Swarm Optimization; in International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics Springer pp 70–81Google Scholar
- de la Rica L, Urquiza JM, Gómez-Cabrero D, Islam AB, López-Bigas N, Tegnér J, Toes RE and Ballestar E 2013 Identification of novel markers in rheumatoid arthritis through integrated analysis of DNA methylation and microRNA expression. J. Autoimmun. 41 6–16Google Scholar
- Feng P-M, Ding H, Chen W and Lin H 2013a Naive Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med. 2013Google Scholar
- Feng P-M, Lin H and Chen W 2013b Identification of antioxidants from sequence information using Naive Bayes. Comput. Math. Methods Med. 2013Google Scholar
- Hackenberg M, Barturen G, Carpena P, Luque-Escamilla PL, Previti C and Oliver JL 2010 Prediction of CpG-island function: CpG clustering vs. sliding-window methods. BMC Genomics 11 327Google Scholar
- Rice P, Longden I and Bleasby A 2000 EMBOSS: The European molecular biology open software suite. Elsevier Current TrendsGoogle Scholar
- Turner N 2000 Chi-squared test. J. Clin. Nurs. 9 10Google Scholar
- Wang J, Tsang WW and Marsaglia G 2003 Evaluating Kolmogorov’s distribution. J. Stat. Softw. 8 1–4Google Scholar
- Yoon B-J and Vaidyanathan P 2004 Identification of CpG islands using a bank of IIR lowpass filters [DNA sequence detection]; in Digital Signal Processing Workshop, 2004, and the 3rd IEEE Signal Processing Education Workshop IEEE pp 315–319Google Scholar