Abstract
The search of a multiple sequence alignment (MSA) is a well-known problem in bioinformatics that consists in finding a sequence alignment of three or more biological sequences. In this paper, we propose a parallel iterative algorithm for the global alignment of multiple biological sequences. In this algorithm, a number of processes work independently at the same time searching for the best MSA of a set of sequences. It uses a Longest Common Subsequence (LCS) technique in order to generate a first MSA. An iterative process improves the MSA by applying a number of operators that have been implemented to produce more accurate alignments. Simulations were made using sequences from the UniProKB protein database. A preliminary performance analysis and comparison with several common methods for MSA shows promising results. The implementation was developed on a cluster platform through the use of the standard Message Passing Interface (MPI) library.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. Molecular Biology-ElsevierĀ 215(3), 403ā410 (1990)
Anbarasu, L., Narayanasamy, P., Sundararajan, V.: Multiple molecular sequence alignment by island parallel genetic algorithm. Current ScienceĀ 78(7), 858ā863 (2000)
Bilu, Y., Agarwal, P., Kilodny, R.: Faster algorithms for optimal multiple sequence alignment based on pairwise comparisons. IEEE/ACM Transactions on Computational Biology and BioinformaticsĀ 3(4), 408ā422 (2006)
Chengpeng, B.: DNA motif alignment by evolving a population of Markov chains. BMC BioinformaticsĀ 10(1), S13 (2009)
Edgar, R.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids ResearchĀ 32(5), 1792ā1797 (2004)
Galperin, M., Cochrane, G.: The 2011 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids ResearchĀ 39, D1āD6 (2011)
Gotoh, O.: Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as a assessed by reference to structural alignments. J. Mol. Biol.Ā 264, 823ā838 (1996)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. BiochemistryĀ 89, 10915ā10919 (1992)
Jones, N., Pevzner, P.A.: An introduction to bioinformatics algorithms. MIT Press (1996)
Kim, J., Pramanik, S., Chung, M.: Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci.Ā 10(4), 419ā426 (1994)
Kleinjung, J., Douglas, N., Heringa, J.: Parallelized multiple alignment. Bioinformatics Applications NoteĀ 18(9), 1270ā1271 (2002)
Lassmann, T., Frings, O., Sonnhammer, E.: Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleid Acids ResearchĀ 37(3), 858ā865 (2009)
Li, K.: Clustalw-mpi: Clustalw analysis using distributed and parallel computing. Bioinformatics Applications NoteĀ 19(12), 1585ā1586 (2003)
Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. ScienceĀ 227(4693), 1435ā1441 (1985)
Lu, Y., Sze, S.: Improvig accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues. Nucleic Acids ResearchĀ 37(2), 463ā472 (2009)
Luscombe, N., Greenbaum, D., Gerstein, M.: What is bioinformatics? a proposed definition and overview of the field. Method Inf. Med.Ā 40(4), 346ā358 (2001)
Moretti, S., Armougom, F., Wallace, I., Higgins, D., Jongeneel, C., Notredame, C.: The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. Nucleic Acids ResearchĀ 35, Web Server Issue, W645āW648 (2007)
Mount, D.: Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press (2004)
National Center for Biotechnology Information: Fasta format, http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol.Ā 48, 443ā453 (1970)
Notredame, C., Higgins, D.: Saga: sequence alignment by genetic algorithm. Nucleic Acids ResearchĀ 24(8), 1515ā1524 (1996)
Notredame, C., Higgins, D., Heringa, J.: T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol.Ā 302(1), 205ā217 (2000)
Shu, N., Elofsson, A.: KalignP: Improved multiple sequence alignments using position specific gap penalties in kalign2. Bioinformatics Applications NoteĀ 27(12), 1702ā1703 (2011)
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol.Ā 147, 195ā197 (1981)
Thompson, J., Higgins, D., Gibson, T.: Clustal w: improving the sensitivy of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids ResearchĀ 22(22), 4673ā4680 (1994)
Wagner, R., Fischer, M.: The string-to-string correction problem. ACMĀ 21(1), 168ā173 (1974)
Wallace, I., OāSullivan, O., Higgins, D., Notredame, C.: M-coffee: combining multiple sequence alignment methods with t-coffee. Nucleic Acids ResearchĀ 34(6), 1692ā1699 (2006)
Wang, Y., Li, K.: An adaptative and iterative algorithm for refining multiple sequence alignment. Computational Biology and ChemistryĀ 28, 141ā148 (2004)
Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning dna sequences. Journal of Computational BiologyĀ 7(1/2), 203ā214 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Andalon-Garcia, I.R., Chavoya, A., Meda-CampaƱa, M.E. (2012). A Parallel Algorithm for Multiple Biological Sequence Alignment. In: Lones, M.A., Smith, S.L., Teichmann, S., Naef, F., Walker, J.A., Trefzer, M.A. (eds) Information Processign in Cells and Tissues. IPCAT 2012. Lecture Notes in Computer Science, vol 7223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28792-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-28792-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28791-6
Online ISBN: 978-3-642-28792-3
eBook Packages: Computer ScienceComputer Science (R0)