Skip to main content

CSA-MEM: Enhancing Circular DNA Multiple Alignment Through Text Indexing Algorithms

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2023)

Abstract

In the realm of Bioinformatics, the comparison of DNA sequences is essential for tasks such as phylogenetic identification, comparative genomics, and genome reconstruction. Methods for estimating sequence similarity have been successfully applied in this field. The application of these methods to circular genomic structures, common in nature, poses additional computational hurdles. In the advancing field of metagenomics, innovative circular DNA alignment algorithms are vital for accurately understanding circular genome complexities. Aligning circular DNA, more intricate than linear sequences, demands heightened algorithms due to circularity, escalating computation requirements and runtime. This paper proposes CSA-MEM, an efficient text indexing algorithm to identify the most informative region to rotate and cut circular genomes, thus improving alignment accuracy. The algorithm uses a circular variation of the FM-Index and identifies the longest chain of non-repeated maximal subsequences common to a set of circular genomes, enabling the most adequate rotation and linearisation for multiple alignment. The effectiveness of the approach was validated in five sets of mitochondrial, viral and bacterial DNA. The results show that CSA-MEM significantly improves the efficiency of multiple sequence alignment, consistently achieving top scores compared to other state-of-the-art methods. This tool enables more realistic phylogenetic comparisons between species, facilitates large metagenomic data processing, and opens up new possibilities in comparative genomics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ayad, L.A., Pissis, S.P.: MARS: improving multiple circular sequence alignment using refined sequences. BMC Genomics 18(1), 1–10 (2017)

    Article  Google Scholar 

  2. Barton, C., Iliopoulos, C.S., Kundu, R., Pissis, S.P., Retha, A., Vayani, F.: Accurate and efficient methods to improve multiple circular sequence alignment. In: Bampis, E. (ed.) SEA 2015. LNCS, vol. 9125, pp. 247–258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20086-6_19

    Chapter  Google Scholar 

  3. Barton, C., Iliopoulos, C.S., Pissis, S.P.: Fast algorithms for approximate circular string matching. Algorithms Mol. Biol. 9, 1–10 (2014)

    Article  Google Scholar 

  4. Burrows, M.: A block-sorting lossless data compression algorithm. SRS Res. Rep. 124 (1994)

    Google Scholar 

  5. Carattoli, A.: Plasmids and the spread of resistance. Int. J. Med. Microbiol. 303(6), 298–304 (2013)

    Article  CAS  PubMed  Google Scholar 

  6. Dulanto, C.A., Dekker, J.P.: From the pipeline to the bedside: advances and challenges in clinical metagenomics. J. Infect. Dis. 221(Supplement 3), S331–S340 (2019)

    Google Scholar 

  7. Fehér, E., Mihalov-Kovács, E., Kaszab, E., Malik, Y.S., Marton, S., Bányai, K.: Genomic diversity of CRESS DNA viruses in the eukaryotic Virome of swine feces. Microorganisms 9(7), 1426 (2021)

    Article  PubMed  PubMed Central  Google Scholar 

  8. Fernandes, F., Freitas, A.T.: slaMEM: efficient retrieval of maximal exact matches using a sampled LCP array. Bioinformatics 30(4), 464–471 (2014)

    Article  CAS  PubMed  Google Scholar 

  9. Fernandes, F., Pereira, L., Freitas, A.T.: CSA: an efficient algorithm to improve circular DNA multiple alignment. BMC Bioinformatics 10(1), 1–13 (2009)

    Article  Google Scholar 

  10. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000)

    Google Scholar 

  11. Grossi, R., Iliopoulos, C.S., Mercas, R., et al.: Circular sequence comparison: algorithms and applications. Algorithms Mol. Biol. 11(12) (2016)

    Google Scholar 

  12. Gusfield, D.: An “increment-by-one” approach to suffix arrays and trees. Report. CSE-90-39, Computer Science Division, University of California, Davis (1990)

    Google Scholar 

  13. Laudadio, I., Fulc, V., Stronati, L., Carissimi, C.: Next-generation metagenomics: methodological challenges and opportunities. OMICS 23(7), 327–333 (2019)

    Article  CAS  PubMed  Google Scholar 

  14. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Google Scholar 

  15. Mosig, A., Hofacker, I.L., Stadler, P.F.: Comparative analysis of cyclic sequences: viroids and other small circular RNAs. In: Lecture Notes in Informatics. Proceedings German Conference on Bioinformatics (2006)

    Google Scholar 

  16. Pan, S., Zhao, X.M., Coelho, L.P.: SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing. Bioinformatics 39(Supplement 1), i21–i29 (2023)

    Article  PubMed  PubMed Central  Google Scholar 

  17. Pereira, L., et al.: The diversity present in 5140 human mitochondrial genomes. Am. J. Hum. Genetics 84(5), 628–640 (2009)

    Article  CAS  Google Scholar 

  18. Pohjoismäki, J.L.O., Goffart, S.: Of circles, forks and humanity: topological organisation and replication of mammalian mitochondrial DNA. BioEssays 33(4), 290–299 (2011)

    Article  PubMed  Google Scholar 

  19. Thompson, J.D., Gibson, T.J., Higgins, D.G.: Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics 1, 2–3 (2003)

    Google Scholar 

  20. Tisza, M.J., et al.: Discovery of several thousand highly diverse circular DNA viruses. Elife 9 (2020)

    Google Scholar 

  21. Yang, L., et al.: Extrachromosomal circular DNA: biogenesis, structure, functions and diseases. Signal Transduct. Target. Ther. 7(1), 342 (2022)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Zhang, Y., Zhang, Q., Zhou, J., Zou, Q.: A survey on the algorithm and development of multiple sequence alignment. Briefings Bioinformatics 23(3) (2022)

    Google Scholar 

  23. Zhao, L., Rosario, K., Breitbart, M., Duffy, S.: Chapter three - eukaryotic circular rep-encoding single-stranded DNA (cress DNA) viruses: ubiquitous viruses with small genomes and a diverse host range. In: Advances in Virus Research, vol. 103, pp. 71–133 (2019)

    Google Scholar 

Download references

Acknowledgement

The authors acknowledge the support of Fundação para a Ciência e a Tecnologia, projects PRELUNA (Grant PTDC/CCIINF/4703/2021) and UIDB/50021/2020.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to André Salgado , Francisco Fernandes or Ana Teresa Freitas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Salgado, A., Fernandes, F., Freitas, A.T. (2023). CSA-MEM: Enhancing Circular DNA Multiple Alignment Through Text Indexing Algorithms. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2023. Lecture Notes in Computer Science(), vol 14248. Springer, Singapore. https://doi.org/10.1007/978-981-99-7074-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7074-2_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7073-5

  • Online ISBN: 978-981-99-7074-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics