Alignments with Non-overlapping Moves, Inversions and Tandem Duplications in O(n4) Time

  • Christian Ledergerber
  • Christophe Dessimoz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4598)

Abstract

Sequence alignment is a central problem in bioinformatics. The classical dynamic programming algorithm aligns two sequences by optimizing over possible insertions, deletions and substitution. However, other evolutionary events can be observed, such as inversions, tandem duplications or moves (transpositions). It has been established that the extension of the problem to move operations is NP-complete. Previous work has shown that an extension restricted to non-overlapping inversions can be solved in O(n 3) with a restricted scoring scheme. In this paper, we show that the alignment problem extended to non-overlapping moves can be solved in O(n 5) for general scoring schemes, O(n 4logn) for concave scoring schemes and O(n 4) for restricted scoring schemes. Furthermore, we show that the alignment problem extended to non-overlapping moves, inversions and tandem duplications can be solved with the same time complexities. Finally, an example of an alignment with non-overlapping moves is provided.

Keywords

Tandem Duplication Edit Operation Edit Graph Grid Graph Alignment Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRefGoogle Scholar
  2. 2.
    Fliess, A., Motro, B., Unger, R.: Swaps in protein sequences. Proteins. 48(2), 377–387 (2002)CrossRefGoogle Scholar
  3. 3.
    Lopresti, D., Tomkins, A.: Block edit models for approximate string matching. Theor. Comput. Sci. 181(1), 159–179 (1997)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Shapira, D., Storer, J.A.: Edit distance with move operations. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 85–98. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Chrobak, M., Kolman, P., Sgall, J.: The greedy algorithm for the minimum common string partition problem. ACM Trans. Algorithms 1(2), 350–366 (2005)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. In: SODA 2002. Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA. Society for Industrial and Applied Mathematics, pp. 667–676. ACM Press, New York (2002)Google Scholar
  7. 7.
    Caprara, A.: Sorting by reversals is difficult. In: RECOMB 1997. Proceedings of the first annual international conference on Computational molecular biology, pp. 75–83. ACM Press, New York (1997)CrossRefGoogle Scholar
  8. 8.
    Schoeninger, M., Waterman, M.S.: A local algorithm for dna sequence alignment with inversions. Bull. Math. Biol. 54(4), 521–536 (1992)Google Scholar
  9. 9.
    Chen, Z.Z., Gao, Y., Lin, G., Niewiadomski, R., Wang, Y., Wu, J.: A space-efficient algorithm for sequence alignment with inversions and reversals. Theor. Comput. Sci. 325(3), 361–372 (2004)MATHCrossRefGoogle Scholar
  10. 10.
    do Lago, A.P., Muchnik, I.: A sparse dynamic programming algorithm for alignment with non-overlapping inversions. Theoret. Informatics Appl. 39(1), 175–189 (2005)MATHCrossRefGoogle Scholar
  11. 11.
    Alves, C.E.R., do Lago, A.P., Vellozo, A.F.: Alignment with non-overlapping inversions in o(n 3 logn) time. In: Proceedings of GRACO 2005. Electronic Notes in Discrete Mathematics, vol. 19, pp. 365–371. Elsevier, Amsterdam (2005)Google Scholar
  12. 12.
    Vellozo, A.F., Alves, C.E.R., do Lago, A.P.: Alignment with non-overlapping inversions in o(n 3)-time. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 186–196. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Apic, G., Gough, J., Teichmann, S.A.: Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 310(2), 311–325 (2001)CrossRefGoogle Scholar
  14. 14.
    Kaessmann, H., Zöllner, S., Nekrutenko, A., Li, W.H.: Signatures of domain shuffling in the human genome. Genome Res. 12(11), 1642–1650 (2002)CrossRefGoogle Scholar
  15. 15.
    Liu, M., Walch, H., Wu, S., Grigoriev, A.: Significant expansion of exon-bordering protein domains during animal proteome evolution. Nucleic Acids Res. 33(1), 95–105 (2005)CrossRefGoogle Scholar
  16. 16.
    Vibranovski, M.D., Sakabe, N.J., de Oliveira, R.S., de Souza, S.J.: Signs of ancient and modern exon-shuffling are correlated to the distribution of ancient and modern domains along proteins. J. Mol. Evol. 61(3), 341–350 (2005)CrossRefGoogle Scholar
  17. 17.
    Bashton, M., Chothia, C.: The geometry of domain combination in proteins. J. Mol. Biol. 315(4), 927–939 (2002)CrossRefGoogle Scholar
  18. 18.
    Shandala, T., Gregory, S.L., Dalton, H.E., Smallhorn, M., Saint, R.: Citron kinase is an essential effector of the pbl-activated rho signalling pathway in drosophila melanogaster. Development 131(20), 5053–5063 (2004)CrossRefGoogle Scholar
  19. 19.
    Andrade, M.A., Perez-Iratxeta, C., Ponting, C.P.: Protein repeats: structures, functions, and evolution. J. Struct. Biol. 134(2-3), 117–131 (2001)CrossRefGoogle Scholar
  20. 20.
    Marcotte, E.M., Pellegrini, M., Yeates, T.O., Eisenberg, D.: A census of protein repeats. J. Mol. Biol. 293(1), 151–160 (1999)CrossRefGoogle Scholar
  21. 21.
    Maes, M.: On a cyclic string-to-string correction problem. Inf. Process. Lett. 35(2), 73–78 (1990)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Myers, E.W.: An overview of sequence comparison algorithms in molecular biology. Technical Report 91-29, Univ. of Arizona, Dept. of Computer Science (1991)Google Scholar
  23. 23.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: computer science and computational biology. Press Syndicate of the University of Cambridge, Cambridge (1997/1999)Google Scholar
  24. 24.
    Landau, G.M., Ziv-Ukelson, M.: On the common substring alignment problem. J. Algorithms 41(2), 338–354 (2001)MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Schmidt, J.P.: All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM J. Comput. 27(4), 972–992 (1998)MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Aggarwal, A., Klawe, M.M., Moran, S., Shor, P., Wilber, R.: Geometric applications of a matrix-searching algorithm. Algorithmica 2(1), 195–208 (1987)MATHCrossRefMathSciNetGoogle Scholar
  27. 27.
    Gonnet, G.H., Hallett, M.T., Korostensky, C., Bernardin, L.: Darwin v. 2.0: An interpreted computer language for the biosciences. Bioinformatics 16(2), 101–103 (2000)CrossRefGoogle Scholar
  28. 28.
    Monge, G.: Déblai et remblai. Mémoires de l’Académie Royale des Sciences (1781)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Christian Ledergerber
    • 1
  • Christophe Dessimoz
    • 1
  1. 1.ETH Zurich, Institute of Computational ScienceSwitzerland

Personalised recommendations