Advertisement

Finding Optimal Alignment and Consensus of Circular Strings

  • Taehyung Lee
  • Joong Chae Na
  • Heejin Park
  • Kunsoo Park
  • Jeong Seop Sim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6129)

Abstract

We consider the problem of finding the optimal alignment and consensus (string) of circular strings. Circular strings are different from linear strings in that the first (leftmost) symbol of a circular string is wrapped around next to the last (rightmost) symbol. In nature, for example, bacterial and mitochondrial DNAs typically form circular strings. The consensus string problem is finding a representative string (consensus) of a given set of strings, and it has been studied on linear strings extensively. However, only a few efforts have been made for the consensus problem for circular strings, even though circular strings are biologically important. In this paper, we introduce the consensus problem for circular strings and present novel algorithms to find the optimal alignment and consensus of circular strings under the Hamming distance metric. They are O(n 2logn)-time algorithms for three circular strings and an O(n 3logn)-time algorithm for four circular strings. Our algorithms are O(n/ logn) times faster than the naïve algorithm directly using the solutions for the linear consensus problems, which takes O(n 3) time for three circular strings and O(n 4) time for four circular strings. We achieved this speedup by adopting a convolution and a system of linear equations into our algorithms to reflect the characteristics of circular strings that we found.

Keywords

Fast Fourier Transform Time Algorithm Close String String Match Optimal Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gusfield, D.: Algorithms on Strings, Tree, and Sequences. Cambridge University Press, Cambridge (1997)Google Scholar
  2. 2.
    Frances, M., Litman, A.: On covering problems of codes. Theory of Computing Systems 30(2), 113–119 (1997)zbMATHMathSciNetGoogle Scholar
  3. 3.
    Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus sequences. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997)Google Scholar
  4. 4.
    Gasieniec, L., Jansson, J., Lingas, A.: Approximation algorithms for Hamming clustering problems. Journal of Discrete Algorithms 2(2), 289–301 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Proceedings of the 10th ACM-SIAM Symposium on Discrete Algorithms, pp. 633–642 (1999)Google Scholar
  6. 6.
    Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pp. 473–482 (1999)Google Scholar
  7. 7.
    Li, M., Ma, B., Wang, L.: On the closest string and substring problems. Journal of the ACM 49(2), 157–171 (2002)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 396–409. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Gramm, J., Niedermeier, R., Rossmanith, P.: Exact solutions for closest string and related problems. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001. LNCS, vol. 2223, pp. 441–453. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica 37(1), 25–42 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., Miller, W.: A linear-time algorithm for the 1-mismatch problem. In: Rau-Chaplin, A., Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 1997. LNCS, vol. 1272, pp. 126–135. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  12. 12.
    Sze, S., Lu, S., Chen, J.: Integrating sample-driven and pattern-driven approaches in motif finding. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 438–449. Springer, Heidelberg (2004)Google Scholar
  13. 13.
    Boucher, C., Brown, D., Durocher, S.: On the structure of small motif recognition instances. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 269–281. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Amir, A., Landau, G.M., Na, J.C., Park, H., Park, K., Sim, J.S.: Consensus optimizing both distance sum and radius. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 234–242. Springer, Heidelberg (2009)Google Scholar
  15. 15.
    Mosig, A., Hofacker, I., Stadler, P.: Comparative analysis of cyclic sequences: Viroids and other small circular RNAs. Lecture Notes in Informatics, vol. P-83, pp. 93–102 (2006)Google Scholar
  16. 16.
    Fernandes, F., Pereira, L., Freitas, A.: CSA: An efficient algorithm to improve circular DNA multiple alignment. BMC Bioinformatics 10(1), 230 (2009)CrossRefGoogle Scholar
  17. 17.
    Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)CrossRefGoogle Scholar
  18. 18.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  19. 19.
    Fischer, M.J., Paterson, M.S.: String matching and other products. In: Karp, R.M. (ed.) Complexity of Computation. SIAM-AMS Proceedings, pp. 113–125 (1974)Google Scholar
  20. 20.
    Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. In: SODA 2000: Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 794–803. Society for Industrial and Applied Mathematics (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Taehyung Lee
    • 1
  • Joong Chae Na
    • 2
  • Heejin Park
    • 3
  • Kunsoo Park
    • 1
  • Jeong Seop Sim
    • 4
  1. 1.Seoul National UniversitySeoulSouth Korea
  2. 2.Sejong UniversitySeoulSouth Korea
  3. 3.Hanyang UniversitySeoulSouth Korea
  4. 4.Inha UniversityIncheonSouth Korea

Personalised recommendations