Abstract
Damages or breaks in DNA may change the characteristics of genomes and causes various diseases. In this work we construct a system that incorporates the maximum likelihood-based probabilistic formula to assess the number of damages that have occurred in any DNA sequence. This approach has been progressively benchmarked by implementing simulated data sets so that the outcomes can be compared with a ground truth or reference value. At first the sequence data set order is checked through the statistical cumulative sum (STACUMSUM). The verified sequences are then estimated by prior and posterior probability to count the percentages of breaks and mutations. Maximum-likelihood estimation then finds out the exact numbers and positions of breaks and detections. In database manipulation, one factor that decides the orientation and order of the sequence is geometric distance between consecutive sequences. The geometric distance is measured for smooth representation of the genome or DNA sequences. Finally, we compared the performance of our system with DAMBE5: (A Comprehensive Software Package for Data Analysis in Molecular Biology and Evaluation), and in response to time and space complexity, StrucBreak is much faster and consumes much less space due to our algorithmic approaches.
Similar content being viewed by others
References
Lu Q, Lund R, Lee T (2010) An MDL approach to the climate segmentation problem. Ann Appl Stat 4:299–319
Robbins M, Gallagher C, Lund R, Aue A (2011) Mean shift testing in correlated data. J Time Ser Anal 32:498–511
Robbins MW, Lund RB, Gallagher CM, Lu Q (2011) Changepoints in the North Atlantic tropical cyclone record. J Am Stat Assoc 106:89–99
Ahmad S, Duke S, Jena R, Williams M, Burnet NG (2012) Advances in radiotherapy. BMJ 345:33–38
Delaney S, Jacob S, Zerbino D, Birney E (2008) Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Res 18(5):821–829
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Lindahl T, Barnes DE (2000) Repair of endogenous DNA damage. Cold Spring Harb Symp Quant Biol 65:127–133
Friedberg EC, Walker GC, Siede W, Wood RD, Schultz RA, Ellenberger T (2006) DNA repair and mutagenesis, 2nd edn. ASM Press, New York
Jackson SP, Bartek J (2009) The DNA-damage response in human biology and disease. Nature 461:1071–1078
Ciccia A, Elledge SJ (2010) The DNA damage response: making it safe to play with knives. Mol Cell 40:179–204
Longhese MP, Bonetti D, Guerini I, Manfrini N, Clerici M (2009) DNA double-strand breaks in meiosis: checking their formation, processing and repair. DNA Repair (Amst) 8:1127–1138
Tsai AG, Lieber MR (2010) Mechanisms of chromosomal rearrangement in the human genome. BMC Genom 11:S1. doi:10.1186/1471-2164-11-S1-S1
Harper JW, Elledge SJ (2007) The DNA damage response: ten years after. Mol Cell 28:739–745
Rouse J, Jackson SP (2002) Interfaces between the detection, signaling, and repair of DNA damage. Science 297:547–551
Harrison JC, Haber JE (2006) Surviving the breakup: The DNA damage checkpoint. Annu Rev Genet 40:209–235
Altmannova V, Eckert-Boulet N, Arneric M, Kolesar P, Chaloupkova R, Damborsky J, Sung P, Zhao X, Lisby M, Krejci L (2010) Rad52 SUMOylation affects the efficiency of the DNA repair. Nucleic Acids Res 38:4708–4721
Lieber MR (2008) The mechanism of human nonhomologous DNA end joining. J Biol Chem 2008(283):1–5
Cimprich KA, Cortez D (2008) ATR: an essential regulator of genome integrity. Nat Rev Mol Cell Biol 9(8):616–627
Kastan MB, Bartek J (2004) Cell-cycle checkpoints and cancer. Nature 432:316–323
Bartek J, Lukas J (2007) DNA damage checkpoints: from initiation to recovery or adaptation. Curr Opin Cell Biol 19:238–245
Munoz-Galvan S, Lopez-Saavedra A, Jackson SP, Huertas P, Cortes-Ledesma F et al (2013) Competing roles of DNA end resection and non-homologous end joining functions in the repair of replication-born double-strand breaks by sister-chromatid recombination. Nucleic Acids Res 41:1669–1683
Xiao A et al (2009) WSTF regulates the H2A.X DNA damage response via a novel tyrosine kinase activity. Nature 457:57–62
Huertas P (2010) DNA resection in eukaryotes: deciding how to fix the break. Nat Struct Mol Biol 17:11–16
Richardso C, Horikoshi N, Pandita TK (2004) The role of the DNA double-strand break response network in meiosis. DNA Repair 3:1149–1164
O’Driscoll M, Jeggo PA (2006) The role of double-strand break repair—insights from human genetics. Nat Rev Genet 7:45–54
McVey M, Lee SE (2008) MMEJ repair of double-strand breaks (director’s cut), deleted sequences and alternative endings. Trends Genet 24:529–538
Chapman JR, Barral P, Vannier JB, Borel V, Steger M et al (2013) RIF1 is essential for 53BP1-dependent nonhomologous end joining and suppression of DNA double-strand break resection. Mol Cell 49:858–871
Fishman-Lobell J, Rudin N, Haber JE (1992) Two alternative pathways of double-strand break repair that are kinetically separable and independently modulated. Mol Cell Biol 12:1292–1303
Ciccia A, Elledge SJ (2010) The DNA damage response: making it safe to play with knives. Mol Cell 40:179–204
Bernstein KA, Gangloff S, Rothstein R (2010) The RecQ DNA helicases in DNA repair. Annu Rev Genet 44:393–417
Bork P, Hofmann K, Bucher P, Neuwald AF, Altschul SF, Koonin EV (1997) A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J 11:68–76
Caldecott KW (2008) Single-strand break repair and genetic disease. Nat Rev Genet 9:619–631
Chou DM, Adamson B, Dephoure NE, Tan X, Nottke AC, Hurov KE, Gygi SP, Colaiacovo MP, Elledge SJ (2010) A chromatin localization screen reveals poly (ADP ribose)- regulated recruitment of the repressive polycomb and NuRD complexes to sites of DNA damage. Proc Natl Acad Sci 107:18475–18480
Kumar V, Grama A, Gupta A, Karypis G (1995) Introduction to parallel computing. Benjamin/Cummings Publishing Company, San Francisco
Kundeti V, Rajasekaran S, Dinh H, Vaughn M, Thapar V (2010) Efficient parallel and out of core algorithms for constructing large bi-directed de bruijn graphs. BMC Bioinf 11:560
Mardis E (2008) Next-generation dna sequencing methods. Annu Rev Genomics Hum Genet 9:387–402
Lindahl T, Barnes DE (2000) Repair of endogenous DNA damage. Cold Spring Harb Symp Quant Biol 65:127–133
Huot Y, Jeffrey WH, Davis RF, Cullen JJ (2000) Damage to DNA in bacterioplankton: a model of damage by ultraviolet radiation and its repair as influenced by vertical mixing. Photochem Photobiol 72(1):62–74
Rajput B, Murphy TD, Pruitt KD (2015) RefSeq curation and annotation of antizyme and antizyme inhibitor genes in vertebrates. Nucleic Acids Res 43(15):7270–7279
Li W, Cowley A, Uludag M, Gur T, McWilliam H, Squizzato S, Park YM, Buso N, Lopez R (2015) The EMBL-EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res 44(5):1–5
Nimmy SF, Kamal MS (2015) Next generation sequencing under de-novo genome assembly. Int J Biomath 8(5):1–29
Kamal MS, Khan MI (2014) Performance evaluation of Warshall algorithm and dynamic programming for Markov chain in local sequence alignment. Interdiscip Sci Comput Life Sci 6:14
Kamal MS, Khan MI (2014) An integrated algorithm for local sequence alignment. Netw Model Anal Health Inf Bioinf 3:68
Kamal MS, Khan MI (2014) Chapman–Kolmogorov equations for global PPIs with discriminant-EM. Int J Biomath 7(5). doi:10.1142/S1793524514500533
Kamal MS, Khan MI (2014) De Bruijn graph based de novo genome assembly. J Softw 9(8):2160–2168
Antoch J, Huskova M, Praskova Z (1997) Effect of dependence on statistics for determination of change. J Stat Plan Inference 60:291–310
Bai J (1994) Least squares estimation of a shift in linear processes. J Time Ser Anal 15:453–472
Haque W, Aravind A, Reddy B (2008) An efficient algorithm for local sequence alignment. Paper presented at the 30th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, Vancouver, BC, 20–25 Aug 2008, pp 1367–1372
Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728
Hofacker IL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31:3429–3431
Vos RA, Balhoff JP, Caravas JA et al (2012) NeXML: rich, extensible, and verifiable representation of comparative data and metadata. Syst Biol 61:675–689
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kamal, M.S., Nimmy, S.F. StrucBreak: A Computational Framework for Structural Break Detection in DNA Sequences. Interdiscip Sci Comput Life Sci 9, 512–527 (2017). https://doi.org/10.1007/s12539-016-0158-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-016-0158-7