Abstract
Multiple sequence alignment (MSA) is a vital problem in biology. Optimal alignment of multiple sequences becomes impractical even for a modest number of sequences [1] since the general version of the problem is NP-hard. Because of the high time complexity of traditional MSA algorithms, even today’s fast computers are not able to solve the problem for large number of sequences. In this paper we present a randomized algorithm to calculate distance matrices which is a major step in many multiple sequence alignment algorithms. The basic idea employed is sampling (along the lines of [2]).
This research has been supported in part by the NSF Grants CCR-9912395 and ITR-0326155.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berger, M.P., Munson, P.J.: A novel randomized iterative strategy for aligning multiple protein sequences. Computer Applications in Biosciences 7(4), 479–484 (1991)
Rajasekaran, S., Nick, H., Pardalos, P.M., Sahni, S., Shaw, G.: Efficient algorithms for local alignment search. Journal of Combinatorial Optimization 5(1), 117–124 (2001)
Charter, K., Schaeffer, J., Szafron, D.: Sequence Alignmetn using FastLSA. In: International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (2000)
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Feng, D., Doolittle, R.: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25, 351–360 (1987)
Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acid Res 30(14), 3059–3066
Corpet, F.: Multiple sequence alignment with hierarchical clustering. Nucleic Acid Res. 16, 10881–10890 (1998)
Karypis, G., Han, S., Kumar, V.: CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Technical report TR-99. University of Minnesota, Minneapolis (1999)
Szymkowiak, A., Larsen, J., Hansen, L.: Hierarchical clustering for datamining. In: Fifth International Conference on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies (2001)
Notredame, C.: Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3(1) (2002)
Gotoh, O.: Furhter improvement in methods of group-to-group sequence alignment with generalized profile operations. Computer Applications in Biosciences 10(4), 379–387 (1994)
Gotoh, O.: Optimal alignment between groups of sequences and its application to multiple sequence alignment. Computer Applications in biosciences 9(3), 361–370 (1993)
Kececioglu, J., Lenhof, H., Mehlhorn, K., Mutzel, P., Reinert, K., Vingron, M.: A polyhedral approach to sequence alignment problems. Discrete applied mathematics 104, 143–186 (2000)
Anibal de Carvalho, S.: Department of Computer Science, King.s college, London, UK, http://neobio.sourceforge.net/
Zarembinski, T.I., Hung, L.W., et al.: structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. In: Proceedings of the National Academy of Science of USA, vol. 95(26), pp. 15189–15193 (1998)
Homo sapiens PAC clone RP5-943F2 from 7, ACCESSION AC004932 VERSION: AC004932.4 GI:13446338, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=nucleotide
http://www.mscs.mu.edu/cstruble/class/biin200/fall2003/notes/msa.ppt
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rajasekaran, S., Thapar, V., Dave, H., Huang, CH. (2004). A Randomized Algorithm for Distance Matrix Calculations in Multiple Sequence Alignment. In: López, J.A., Benfenati, E., Dubitzky, W. (eds) Knowledge Exploration in Life Science Informatics. KELSI 2004. Lecture Notes in Computer Science(), vol 3303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30478-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-30478-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23927-7
Online ISBN: 978-3-540-30478-4
eBook Packages: Springer Book Archive