Abstract
Multiple sequence alignment is a core computational task in bioinformatics and has been extensively studied over the past decades. This computation requires an implicit assumption on the input data: the left- and right-most position for each sequence is relevant. However, this is not the case for circular structures; for instance, MtDNA. Efforts have been made to address this issue but it is far from being solved. We have very recently introduced a fast algorithm for approximate circular string matching (Barton et al., Algo Mol Biol, 2014). Here, we first show how to extend this algorithm for approximate circular dictionary matching; and, then, apply this solution with agglomerative hierarchical clustering to find a sufficiently good rotation for each sequence. Furthermore, we propose an alternative method that is suitable for more divergent sequences. We implemented these methods in BEAR, a programme for improving multiple circular sequence alignment. Experimental results, using real and synthetic data, show the high accuracy and efficiency of these new methods in terms of the inferred likelihood-based phylogenies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R.A., Perleberg, C.H.: Fast and practical approximate string matching. Information Processing Letters 59(1), 21–27 (1996)
Barton, C., Iliopoulos, C.S., Pissis, S.P.: Fast algorithms for approximate circular string matching. Algorithms for Molecular Biology 9(1), 9 (2014)
Barton, C., Iliopoulos, C.S., Pissis, S.P.: Average-case optimal approximate circular string matching. In: Dediu, A.-H., Formenti, E., Martín-Vide, C., Truthe, B. (eds.) LATA 2015. LNCS, vol. 8977, pp. 85–96. Springer, Heidelberg (2015)
Crochemore, M., Iliopoulos, C.S., Pissis, S.P.: A parallel algorithm for fixed-length approximate string-matching with k-mismatches. In: Elomaa, T., Mannila, H., Orponen, P. (eds.) Ukkonen Festschrift 2010. LNCS, vol. 6060, pp. 92–101. Springer, Heidelberg (2010)
Dori, S., Landau, G.M.: Construction of Aho Corasick automaton in linear time for integer alphabets. Information Processing Letters 98(2), 66–72 (2006)
Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5(1), 113 (2004)
Fernandes, F., Pereira, L., Freitas, A.T.: CSA: An efficient algorithm to improve circular DNA multiple alignment. BMC Bioinformatics 10(1), 1–13 (2009)
Fletcher, W., Yang, Z.: INDELible: A flexible simulator of biological sequence evolution. Molecular Biology and Evolution 26(8), 1879–1888 (2009)
Fritzsch, G., Schlegel, M., Stadler, P.F.: Alignments of mitochondrial genome arrangements: Applications to metazoan phylogeny. Journal of Theoretical Biology 240(4), 511–520 (2006)
Goios, A., Pereira, L., Bogue, M., Macaulay, V., Amorim, A.: mtDNA phylogeny and evolution of laboratory mouse strains. Genome Research 17(3), 293–298 (2007)
Hirvola, T., Tarhio, J.: Approximate online matching of circular strings. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 315–325. Springer, Heidelberg (2014)
Iliopoulos, C.S., Mouchard, L., Pinzon, Y.J.: The max-shift algorithm for approximate string matching. In: Brodal, G.S., Frigioni, D., Marchetti-Spaccamela, A. (eds.) WAE 2001. LNCS, vol. 2141, pp. 13–25. Springer, Heidelberg (2001)
Katoh, K., Misawa, K., Kuma, K.I., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30(14), 3059–3066 (2002)
Larkin, M., Blackshields, G., Brown, N., Chenna, R., McGettigan, P., McWilliam, H., Valentin, F., Wallace, I., Wilm, A., Lopez, R., Thompson, J., Gibson, T., Higgins, D.: Clustal W and Clustal X version 2.0 23(21), 2947–2948 (2007)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Tech. Rep. 8 (1966)
Maes, M.: On a cyclic string-to-string correction problem. Information Processing Letters 35(2), 73–78 (1990)
Mosig, A., Hofacker, I.L., Stadler, P.F.: Comparative analysis of cyclic sequences: viroids and other small circular RNAs. In: Huson, D.H., Kohlbacher, O., Lupas, A.N., Nieselt, K., Zell, A. (eds.) German Conference on Bioinformatics. LNI, vol. 83, pp. 93–102. GI (2006)
Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of ACM 46(3), 395–415 (1999)
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302(1), 205–217 (2000)
Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16(6), 276–277 (2000)
Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Mathematical Biosciences 53(1–2), 131–147 (1981)
Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin 28, 1409–1438 (1958)
Stamatakis, A.: Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9), 1312–1313 (2014)
Ukkonen, E.: On approximate string matching. In: Karpinski, M. (ed.) Foundations of Computation Theory. Lecture Notes in Computer Science, vol. 158, pp. 487–495. Springer, Berlin Heidelberg (1983)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1(4), 337–348 (1994)
Wang, Z., Wu, M.: Phylogenomic reconstruction indicates mitochondrial ancestor was an energy parasite. PLoS ONE 10(9), e110685 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Barton, C., Iliopoulos, C.S., Kundu, R., Pissis, S.P., Retha, A., Vayani, F. (2015). Accurate and Efficient Methods to Improve Multiple Circular Sequence Alignment. In: Bampis, E. (eds) Experimental Algorithms. SEA 2015. Lecture Notes in Computer Science(), vol 9125. Springer, Cham. https://doi.org/10.1007/978-3-319-20086-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-20086-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20085-9
Online ISBN: 978-3-319-20086-6
eBook Packages: Computer ScienceComputer Science (R0)