Abstract
In the realm of Bioinformatics, the comparison of DNA sequences is essential for tasks such as phylogenetic identification, comparative genomics, and genome reconstruction. Methods for estimating sequence similarity have been successfully applied in this field. The application of these methods to circular genomic structures, common in nature, poses additional computational hurdles. In the advancing field of metagenomics, innovative circular DNA alignment algorithms are vital for accurately understanding circular genome complexities. Aligning circular DNA, more intricate than linear sequences, demands heightened algorithms due to circularity, escalating computation requirements and runtime. This paper proposes CSA-MEM, an efficient text indexing algorithm to identify the most informative region to rotate and cut circular genomes, thus improving alignment accuracy. The algorithm uses a circular variation of the FM-Index and identifies the longest chain of non-repeated maximal subsequences common to a set of circular genomes, enabling the most adequate rotation and linearisation for multiple alignment. The effectiveness of the approach was validated in five sets of mitochondrial, viral and bacterial DNA. The results show that CSA-MEM significantly improves the efficiency of multiple sequence alignment, consistently achieving top scores compared to other state-of-the-art methods. This tool enables more realistic phylogenetic comparisons between species, facilitates large metagenomic data processing, and opens up new possibilities in comparative genomics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ayad, L.A., Pissis, S.P.: MARS: improving multiple circular sequence alignment using refined sequences. BMC Genomics 18(1), 1–10 (2017)
Barton, C., Iliopoulos, C.S., Kundu, R., Pissis, S.P., Retha, A., Vayani, F.: Accurate and efficient methods to improve multiple circular sequence alignment. In: Bampis, E. (ed.) SEA 2015. LNCS, vol. 9125, pp. 247–258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20086-6_19
Barton, C., Iliopoulos, C.S., Pissis, S.P.: Fast algorithms for approximate circular string matching. Algorithms Mol. Biol. 9, 1–10 (2014)
Burrows, M.: A block-sorting lossless data compression algorithm. SRS Res. Rep. 124 (1994)
Carattoli, A.: Plasmids and the spread of resistance. Int. J. Med. Microbiol. 303(6), 298–304 (2013)
Dulanto, C.A., Dekker, J.P.: From the pipeline to the bedside: advances and challenges in clinical metagenomics. J. Infect. Dis. 221(Supplement 3), S331–S340 (2019)
Fehér, E., Mihalov-Kovács, E., Kaszab, E., Malik, Y.S., Marton, S., Bányai, K.: Genomic diversity of CRESS DNA viruses in the eukaryotic Virome of swine feces. Microorganisms 9(7), 1426 (2021)
Fernandes, F., Freitas, A.T.: slaMEM: efficient retrieval of maximal exact matches using a sampled LCP array. Bioinformatics 30(4), 464–471 (2014)
Fernandes, F., Pereira, L., Freitas, A.T.: CSA: an efficient algorithm to improve circular DNA multiple alignment. BMC Bioinformatics 10(1), 1–13 (2009)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE (2000)
Grossi, R., Iliopoulos, C.S., Mercas, R., et al.: Circular sequence comparison: algorithms and applications. Algorithms Mol. Biol. 11(12) (2016)
Gusfield, D.: An “increment-by-one” approach to suffix arrays and trees. Report. CSE-90-39, Computer Science Division, University of California, Davis (1990)
Laudadio, I., Fulc, V., Stronati, L., Carissimi, C.: Next-generation metagenomics: methodological challenges and opportunities. OMICS 23(7), 327–333 (2019)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Mosig, A., Hofacker, I.L., Stadler, P.F.: Comparative analysis of cyclic sequences: viroids and other small circular RNAs. In: Lecture Notes in Informatics. Proceedings German Conference on Bioinformatics (2006)
Pan, S., Zhao, X.M., Coelho, L.P.: SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing. Bioinformatics 39(Supplement 1), i21–i29 (2023)
Pereira, L., et al.: The diversity present in 5140 human mitochondrial genomes. Am. J. Hum. Genetics 84(5), 628–640 (2009)
Pohjoismäki, J.L.O., Goffart, S.: Of circles, forks and humanity: topological organisation and replication of mammalian mitochondrial DNA. BioEssays 33(4), 290–299 (2011)
Thompson, J.D., Gibson, T.J., Higgins, D.G.: Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics 1, 2–3 (2003)
Tisza, M.J., et al.: Discovery of several thousand highly diverse circular DNA viruses. Elife 9 (2020)
Yang, L., et al.: Extrachromosomal circular DNA: biogenesis, structure, functions and diseases. Signal Transduct. Target. Ther. 7(1), 342 (2022)
Zhang, Y., Zhang, Q., Zhou, J., Zou, Q.: A survey on the algorithm and development of multiple sequence alignment. Briefings Bioinformatics 23(3) (2022)
Zhao, L., Rosario, K., Breitbart, M., Duffy, S.: Chapter three - eukaryotic circular rep-encoding single-stranded DNA (cress DNA) viruses: ubiquitous viruses with small genomes and a diverse host range. In: Advances in Virus Research, vol. 103, pp. 71–133 (2019)
Acknowledgement
The authors acknowledge the support of Fundação para a Ciência e a Tecnologia, projects PRELUNA (Grant PTDC/CCIINF/4703/2021) and UIDB/50021/2020.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Salgado, A., Fernandes, F., Freitas, A.T. (2023). CSA-MEM: Enhancing Circular DNA Multiple Alignment Through Text Indexing Algorithms. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2023. Lecture Notes in Computer Science(), vol 14248. Springer, Singapore. https://doi.org/10.1007/978-981-99-7074-2_41
Download citation
DOI: https://doi.org/10.1007/978-981-99-7074-2_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7073-5
Online ISBN: 978-981-99-7074-2
eBook Packages: Computer ScienceComputer Science (R0)