Skip to main content
Log in

StrucBreak: A Computational Framework for Structural Break Detection in DNA Sequences

  • Original Research Article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Damages or breaks in DNA may change the characteristics of genomes and causes various diseases. In this work we construct a system that incorporates the maximum likelihood-based probabilistic formula to assess the number of damages that have occurred in any DNA sequence. This approach has been progressively benchmarked by implementing simulated data sets so that the outcomes can be compared with a ground truth or reference value. At first the sequence data set order is checked through the statistical cumulative sum (STACUMSUM). The verified sequences are then estimated by prior and posterior probability to count the percentages of breaks and mutations. Maximum-likelihood estimation then finds out the exact numbers and positions of breaks and detections. In database manipulation, one factor that decides the orientation and order of the sequence is geometric distance between consecutive sequences. The geometric distance is measured for smooth representation of the genome or DNA sequences. Finally, we compared the performance of our system with DAMBE5: (A Comprehensive Software Package for Data Analysis in Molecular Biology and Evaluation), and in response to time and space complexity, StrucBreak is much faster and consumes much less space due to our algorithmic approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Lu Q, Lund R, Lee T (2010) An MDL approach to the climate segmentation problem. Ann Appl Stat 4:299–319

    Article  Google Scholar 

  2. Robbins M, Gallagher C, Lund R, Aue A (2011) Mean shift testing in correlated data. J Time Ser Anal 32:498–511

    Article  Google Scholar 

  3. Robbins MW, Lund RB, Gallagher CM, Lu Q (2011) Changepoints in the North Atlantic tropical cyclone record. J Am Stat Assoc 106:89–99

    Article  CAS  Google Scholar 

  4. Ahmad S, Duke S, Jena R, Williams M, Burnet NG (2012) Advances in radiotherapy. BMJ 345:33–38

    Article  Google Scholar 

  5. Delaney S, Jacob S, Zerbino D, Birney E (2008) Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Res 18(5):821–829

    Article  Google Scholar 

  6. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  Google Scholar 

  7. Lindahl T, Barnes DE (2000) Repair of endogenous DNA damage. Cold Spring Harb Symp Quant Biol 65:127–133

    Article  CAS  PubMed  Google Scholar 

  8. Friedberg EC, Walker GC, Siede W, Wood RD, Schultz RA, Ellenberger T (2006) DNA repair and mutagenesis, 2nd edn. ASM Press, New York

    Google Scholar 

  9. Jackson SP, Bartek J (2009) The DNA-damage response in human biology and disease. Nature 461:1071–1078

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Ciccia A, Elledge SJ (2010) The DNA damage response: making it safe to play with knives. Mol Cell 40:179–204

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Longhese MP, Bonetti D, Guerini I, Manfrini N, Clerici M (2009) DNA double-strand breaks in meiosis: checking their formation, processing and repair. DNA Repair (Amst) 8:1127–1138

    Article  CAS  Google Scholar 

  12. Tsai AG, Lieber MR (2010) Mechanisms of chromosomal rearrangement in the human genome. BMC Genom 11:S1. doi:10.1186/1471-2164-11-S1-S1

    Article  Google Scholar 

  13. Harper JW, Elledge SJ (2007) The DNA damage response: ten years after. Mol Cell 28:739–745

    Article  CAS  PubMed  Google Scholar 

  14. Rouse J, Jackson SP (2002) Interfaces between the detection, signaling, and repair of DNA damage. Science 297:547–551

    Article  CAS  PubMed  Google Scholar 

  15. Harrison JC, Haber JE (2006) Surviving the breakup: The DNA damage checkpoint. Annu Rev Genet 40:209–235

    Article  CAS  PubMed  Google Scholar 

  16. Altmannova V, Eckert-Boulet N, Arneric M, Kolesar P, Chaloupkova R, Damborsky J, Sung P, Zhao X, Lisby M, Krejci L (2010) Rad52 SUMOylation affects the efficiency of the DNA repair. Nucleic Acids Res 38:4708–4721

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Lieber MR (2008) The mechanism of human nonhomologous DNA end joining. J Biol Chem 2008(283):1–5

    Article  Google Scholar 

  18. Cimprich KA, Cortez D (2008) ATR: an essential regulator of genome integrity. Nat Rev Mol Cell Biol 9(8):616–627

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Kastan MB, Bartek J (2004) Cell-cycle checkpoints and cancer. Nature 432:316–323

    Article  CAS  PubMed  Google Scholar 

  20. Bartek J, Lukas J (2007) DNA damage checkpoints: from initiation to recovery or adaptation. Curr Opin Cell Biol 19:238–245

    Article  CAS  PubMed  Google Scholar 

  21. Munoz-Galvan S, Lopez-Saavedra A, Jackson SP, Huertas P, Cortes-Ledesma F et al (2013) Competing roles of DNA end resection and non-homologous end joining functions in the repair of replication-born double-strand breaks by sister-chromatid recombination. Nucleic Acids Res 41:1669–1683

    Article  CAS  PubMed  Google Scholar 

  22. Xiao A et al (2009) WSTF regulates the H2A.X DNA damage response via a novel tyrosine kinase activity. Nature 457:57–62

    Article  CAS  PubMed  Google Scholar 

  23. Huertas P (2010) DNA resection in eukaryotes: deciding how to fix the break. Nat Struct Mol Biol 17:11–16

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Richardso C, Horikoshi N, Pandita TK (2004) The role of the DNA double-strand break response network in meiosis. DNA Repair 3:1149–1164

    Article  Google Scholar 

  25. O’Driscoll M, Jeggo PA (2006) The role of double-strand break repair—insights from human genetics. Nat Rev Genet 7:45–54

    Article  PubMed  Google Scholar 

  26. McVey M, Lee SE (2008) MMEJ repair of double-strand breaks (director’s cut), deleted sequences and alternative endings. Trends Genet 24:529–538

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Chapman JR, Barral P, Vannier JB, Borel V, Steger M et al (2013) RIF1 is essential for 53BP1-dependent nonhomologous end joining and suppression of DNA double-strand break resection. Mol Cell 49:858–871

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Fishman-Lobell J, Rudin N, Haber JE (1992) Two alternative pathways of double-strand break repair that are kinetically separable and independently modulated. Mol Cell Biol 12:1292–1303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Ciccia A, Elledge SJ (2010) The DNA damage response: making it safe to play with knives. Mol Cell 40:179–204

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Bernstein KA, Gangloff S, Rothstein R (2010) The RecQ DNA helicases in DNA repair. Annu Rev Genet 44:393–417

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Bork P, Hofmann K, Bucher P, Neuwald AF, Altschul SF, Koonin EV (1997) A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J 11:68–76

    CAS  PubMed  Google Scholar 

  32. Caldecott KW (2008) Single-strand break repair and genetic disease. Nat Rev Genet 9:619–631

    CAS  PubMed  Google Scholar 

  33. Chou DM, Adamson B, Dephoure NE, Tan X, Nottke AC, Hurov KE, Gygi SP, Colaiacovo MP, Elledge SJ (2010) A chromatin localization screen reveals poly (ADP ribose)- regulated recruitment of the repressive polycomb and NuRD complexes to sites of DNA damage. Proc Natl Acad Sci 107:18475–18480

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Kumar V, Grama A, Gupta A, Karypis G (1995) Introduction to parallel computing. Benjamin/Cummings Publishing Company, San Francisco

    Google Scholar 

  35. Kundeti V, Rajasekaran S, Dinh H, Vaughn M, Thapar V (2010) Efficient parallel and out of core algorithms for constructing large bi-directed de bruijn graphs. BMC Bioinf 11:560

    Article  Google Scholar 

  36. Mardis E (2008) Next-generation dna sequencing methods. Annu Rev Genomics Hum Genet 9:387–402

    Article  CAS  PubMed  Google Scholar 

  37. Lindahl T, Barnes DE (2000) Repair of endogenous DNA damage. Cold Spring Harb Symp Quant Biol 65:127–133

    Article  CAS  PubMed  Google Scholar 

  38. Huot Y, Jeffrey WH, Davis RF, Cullen JJ (2000) Damage to DNA in bacterioplankton: a model of damage by ultraviolet radiation and its repair as influenced by vertical mixing. Photochem Photobiol 72(1):62–74

    Article  CAS  PubMed  Google Scholar 

  39. Rajput B, Murphy TD, Pruitt KD (2015) RefSeq curation and annotation of antizyme and antizyme inhibitor genes in vertebrates. Nucleic Acids Res 43(15):7270–7279

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Li W, Cowley A, Uludag M, Gur T, McWilliam H, Squizzato S, Park YM, Buso N, Lopez R (2015) The EMBL-EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res 44(5):1–5

    Google Scholar 

  41. Nimmy SF, Kamal MS (2015) Next generation sequencing under de-novo genome assembly. Int J Biomath 8(5):1–29

    Article  Google Scholar 

  42. Kamal MS, Khan MI (2014) Performance evaluation of Warshall algorithm and dynamic programming for Markov chain in local sequence alignment. Interdiscip Sci Comput Life Sci 6:14

    Google Scholar 

  43. Kamal MS, Khan MI (2014) An integrated algorithm for local sequence alignment. Netw Model Anal Health Inf Bioinf 3:68

    Article  CAS  Google Scholar 

  44. Kamal MS, Khan MI (2014) Chapman–Kolmogorov equations for global PPIs with discriminant-EM. Int J Biomath 7(5). doi:10.1142/S1793524514500533

  45. Kamal MS, Khan MI (2014) De Bruijn graph based de novo genome assembly. J Softw 9(8):2160–2168

    Google Scholar 

  46. Antoch J, Huskova M, Praskova Z (1997) Effect of dependence on statistics for determination of change. J Stat Plan Inference 60:291–310

    Article  Google Scholar 

  47. Bai J (1994) Least squares estimation of a shift in linear processes. J Time Ser Anal 15:453–472

    Article  Google Scholar 

  48. Haque W, Aravind A, Reddy B (2008) An efficient algorithm for local sequence alignment. Paper presented at the 30th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, Vancouver, BC, 20–25 Aug 2008, pp 1367–1372

  49. Xia X (2013) DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Hofacker IL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31:3429–3431

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Vos RA, Balhoff JP, Caravas JA et al (2012) NeXML: rich, extensible, and verifiable representation of comparative data and metadata. Syst Biol 61:675–689

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Sarwar Kamal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamal, M.S., Nimmy, S.F. StrucBreak: A Computational Framework for Structural Break Detection in DNA Sequences. Interdiscip Sci Comput Life Sci 9, 512–527 (2017). https://doi.org/10.1007/s12539-016-0158-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-016-0158-7

Keywords

Navigation