Efficient Algorithms for Regular Expression Constrained Sequence Alignment

  • Yun-Sheng Chung
  • Chin Lung Lu
  • Chuan Yi Tang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)


Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O(|Σ|2 |V|4 n 2) and O(|Σ|2 |V|4 n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V|3 n 2) time and O(|V|2 n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(|V|2log|V| n 2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of |Σ|2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jiang, T., Xu, Y., Zhang, M.Q. (eds.): Current Topics in Computational Molecular Biology. MIT Press, Cambridge (2002)Google Scholar
  2. 2.
    Tang, C.Y., Lu, C.L., Chang, M.D.T., Tsai, Y.T., Sun, Y.J., Chao, K.M., Chang, J.M., Chiou, Y.H., Wu, C.M., Chang, H.T., Chou, W.I.: Constrained sequence alignment tool development and its application to rnase family alignment. Journal of Bioinfomatics and Computational Biology 1, 267–287 (2003)CrossRefGoogle Scholar
  3. 3.
    Chin, F.Y.L., Ho, N.L., Lam, T.W., Wong, P.W.H.: Efficient constrained multiple sequence alignment with performance guarantee. Journal of Bioinformatics and Computational Biology 3(1), 1–18 (2005)CrossRefGoogle Scholar
  4. 4.
    Tsai, Y.T., Huang, Y.P., Yu, C.T., Lu, C.L.: Music: A tool for multiple sequence alignment with constraints. Bioinformatics 20, 2309–2311 (2004)CrossRefGoogle Scholar
  5. 5.
    Lu, C.L., Huang, Y.P.: A memory-efficient algorithm for multiple sequence alignment with constraints. Bioinformatics 21(1), 20–30 (2005)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Arslan, A.N.: Regular expression constrained sequence alignment. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 322–333. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Hulo, N., Sigrist, C.J.A., Saux, V.L., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., Castro, E.D., Bucher, P., Bairoch, A.: Recent improvements to the prosite database. Nucleic Acids Res. 32, 134–137 (2004)CrossRefGoogle Scholar
  8. 8.
    Faisst, S., Meyer, S.: Compilation of vertebrate-encoded transcription factors. Nucleic Acids Research 20(1), 3–26 (1992)CrossRefGoogle Scholar
  9. 9.
    Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2001)zbMATHGoogle Scholar
  10. 10.
    Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Comm. ACM 18, 341–343 (1975)CrossRefMathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yun-Sheng Chung
    • 1
  • Chin Lung Lu
    • 2
  • Chuan Yi Tang
    • 1
  1. 1.Department of Computer ScienceNational Tsing Hua UniversityHsinchuTaiwan, ROC
  2. 2.Department of Biological Science and TechnologyNational Chiao Tung UniversityHsinchuTaiwan, ROC

Personalised recommendations