# A Fast Longest Common Subsequence Algorithm for Biosequences Alignment

Searching for the longest common substring (LCS) of biosequences is one of the most important tasks in Bioinformatics. A fast algorithm for LCS problem named FAST_LCS is presented. The algorithm first seeks the successors of the initial identical character pairs according to a successor table to obtain all the identical pairs and their levels. Then by tracing back from the identical character pair at the largest level, the result of LCS can be obtained. For two sequences *X* and *Y* with lengths *n* and *m*, the memory required for FAST_LCS is max{8*(*n*+1)*8*(*m**1),*L*}, here *L* is the number of identical character pairs and time complexity of parallel implementation is *O*(|LCS(*X*,*Y*)|), here, |LCS(*X*,*Y*)| is the length of the LCS of *X*,*Y*. Experimental result on the gene sequences of *tigr* database using MPP parallel computer Shenteng 1800 shows that our algorithm can get exact correct result and is faster and more efficient than other LCS algorithms.

### Keywords

bioinformatics longest common subsequence identical character pair### References

- A. Aggarwal and J. Park, 1988, Notes on Searching in Multidimensional Monotone Arrays, Proc. 29th Ann. IEEE Symp. Foundations of Comput. Sci. pp. 497-512.Google Scholar
- A. Aho, D. Hirschberg, and J. Ullman, 1976, Bounds on the Complexity of the Longest Common Subsequence Problem, J. Assoc. Comput. Mach., Vol. 23, No. 1, 1976, pp. 1-12.CrossRefGoogle Scholar
- A. Apostolico, M. Atallah, L. Larmore, and S. Mcfaddin, 1990, Efficient Parallel Algorithms for String Editing and Related Problems, SIAM J. Computing, Vol. 19, pp. 968-988.Google Scholar
- Bailin Hao, Shuyu Zhang, 2000, The manual of Bioinformatics, Shanghai science and technology publishing company.Google Scholar
- D.S. Hirschberg, 1975, A Linear Space Algorithm for Computing Maximal Common Subsequences, Commun. ACM, Vol. 18, No. 6, pp. 341-343.CrossRefGoogle Scholar
- E.W. Mayers, W. Miller, 1998, Optimal Alignment in Linear Space, Comput. Appl. Biosci. Vol. 4, No. 1, pp. 11-17.Google Scholar
- Edmiston E.W., Core N.G., Saltz J.H, et al., 1988, Parallel processing of biological sequence comparison algorithms. International Journal of Parallel Programming, Vol. 17, No. 3, pp. 259-275.CrossRefGoogle Scholar
- Jean Frédéric Myoupo, David Seme, 1999, Time-Efficient Parallel Algorithms for the Longest Common Subsequence and Related Problems, Journal of Parallel and Distributed Computing, Vol. 57, No. 2, pp. 212-223.CrossRefGoogle Scholar
- K. Nandan Babu, Wipro Systems, and Sanjeev Saxena, 1997, Parallel Algorithms for the Longest Common Subsequence Problem, 4th International Conference on High Performance Computing, December 18-21, 1997 - Bangalore, India.Google Scholar
- L. Bergroth, H. Hakonen, and T. Raita, 2000, A survey of longest common subsequence algorithms, Seventh International Symposium on String Processing Information Retrieval, pp. 39-48.Google Scholar
- Needleman, S.B. and Wunsch, C.D., 1970, A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., Vol. 48, No. 3, pp. 443-453.CrossRefPubMedGoogle Scholar
- O. Gotoh, 1982, An improved algorithm for matching biological sequences, J. Molec. Biol. Vol. 162, pp. 705-708.CrossRefPubMedGoogle Scholar
- Smith T.F., Waterman M.S. 1990, Identification of common molecular subsequence. Journal of Molecular Biology, Vol. 215, pp. 403-410.CrossRefGoogle Scholar
- V. Freschi and A. Bogliolo, 2004, Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism, Information Processing Letters, Vol. 90, No. 4, pp. 167-173.CrossRefGoogle Scholar
- Y. Pan, K. Li, 1998, Linear Array with a Reconfigurable Pipelined Bus System - Concepts and Applications, Journal of Information Science, Vol. 106, pp. 237-258.CrossRefGoogle Scholar